tidymodels / tidypredict

Run predictions inside the database
https://tidypredict.tidymodels.org
Other
258 stars 31 forks source link

tydypredict fails with randomforest regression #77

Open rogerjdeangelis opened 4 years ago

rogerjdeangelis commented 4 years ago
library(randomForest);                                                                                                                          
library(tidypredict);                                                                                                                           
library(dbplyr);                                                                                                                                
library(dplyr);                                                                                                                                 
head(iris[,1:4]);                                                                                                                               
model <- randomForest(iris$Sepal.Length~ ., data = iris[,2:4], ntree = 1);                                                                      
tidypredict_sql(model,dbplyr::simulate_mssql());                                                                                                

Partial listing

[[1]]                                                                                                                                           
<SQL> CASE                                                                                                                                      
WHEN (`Petal.Length` >= 5.7 AND `Sepal.Width` < 2.75 AND `Petal.Length` >= 4.6) THEN (NULL)                                                     
WHEN (`Petal.Width` < 0.65 AND `Sepal.Width` < 2.55 AND `Sepal.Width` < 3.05 AND `Petal.Length` < 4.6) THEN (NULL)                              
WHEN (`Petal.Length` >= 3.2 AND `Sepal.Width` < 3.9 AND `Sepal.Width` >= 3.05 AND `Petal.Length` < 4.6) THEN (NULL)                             
WHEN (`Petal.Width` < 0.3 AND `Sepal.Width` >= 3.9 AND `Sepal.Width` >= 3.05 AND `Petal.Length` < 4.6) THEN (NULL)                              
WHEN (`Petal.Width` >= 0.3 AND `Sepal.Width` >= 3.9 AND `Sepal.Width` >= 3.05 AND `Petal.Length` < 4.6) THEN (NULL)                             
WHEN (`Petal.Length` < 5.2 AND `Petal.Length` < 5.7 AND `Sepal.Width` < 2.75 AND `Petal.Length` >= 4.6) THEN (NULL)                             
WHEN (`Petal.Length` >= 5.2 AND `Petal.Length` < 5.7 AND `Sepal.Width` < 2.75 AND `Petal.Length` >= 4.6) THEN (NULL)                            
WHEN (`Petal.Length` < 3.65 AND `Petal.Width` >= 0.65 AND `Sepal.Width` < 2.55 AND `Sepal.Width` < 3.05 AND `Petal.Length` < 4.6) THEN (NULL)   
WHEN (`Sepal.Width` < 2.65 AND `Sepal.Width` < 2.85 AND `Sepal.Width` >= 2.55 AND `Sepal.Width` < 3.05 AND `Petal.Length` < 4.6) THEN (NULL)    
WHEN (`Petal.Length` < 2.75 AND `Sepal.Width` >= 2.85 AND `Sepal.Width` >= 2.55 AND `Sepal.Width` < 3.05 AND `Petal.Length` < 4.6) THEN (NULL)  
WHEN (`Sepal.Width` < 2.9 AND `Petal.Width` < 1.75 AND `Petal.Width` < 1.85 AND `Sepal.Width` >= 2.75 AND `Petal.Length` >= 4.6) THEN (NULL)    
WHEN (`Petal.Length` >= 5.8 AND `Petal.Width` >= 1.75 AND `Petal.Width` < 1.85 AND `Sepal.Width` >= 2.75 AND `Petal.Length` >= 4.6) THEN (NULL) 
WHEN (`Petal.Length` >= 5.85 AND `Petal.Length` < 6.0 AND `Petal.Width` >= 1.85 AND `Sepal.Width` >= 2.75 AND `Petal.Length` >= 4.6) THEN (NULL)
WHEN (`Sepal.Width` < 3.4 AND `Petal.Length` >= 6.0 AND `Petal.Width` >= 1.85 AND `Sepal.Width` >= 2.75 AND `Petal.Length` >= 4.6) THEN (NULL)  
WHEN (`Sepal.Width` >= 3.4 AND `Petal.Length` >= 6.0 AND `Petal.Width` >= 1.85 AND `Sepal.Width` >= 2.75 AND `Petal.Length` >= 4.6) THEN (NULL) 
rogerjdeangelis commented 4 years ago

Hi tidypredict team

I realize the code for just one tree is low priority, but I like to look at the code and the tree diagram to get insight into how randomforest is slicing up the data.

I am a SAS programmer but find myself leaning on tidy and haven packages more and more as time goes on.

Thanks for providing these packages!

Roger

edgararuiz commented 3 years ago

Hi @rogerjdeangelis, what is the issue that you are seeing? I'm not getting any errors.

If what you want to see is the structure of the tree, you could use parse_model() to get an object that reads the Random Forest model, and breaks it down into a somewhat readable list, here is an example:

library(randomForest)        
library(tidypredict)                                                                                                                           

model <- randomForest(iris$Sepal.Length~ ., data = iris[,2:4], ntree = 1)

parsedmodel  <- parse_model(model)

str(parsedmodel$trees)
#> List of 1
#>  $ :List of 48
#>   ..$ :List of 2
#>   .. ..$ prediction: NULL
#>   .. ..$ path      :List of 3
#>   .. .. ..$ :List of 4
#>   .. .. .. ..$ type: chr "conditional"
#>   .. .. .. ..$ col : chr "Petal.Width"
#>   .. .. .. ..$ val : num 1.15
#>   .. .. .. ..$ op  : chr "less"
#>   .. .. ..$ :List of 4
#>   .. .. .. ..$ type: chr "conditional"
#>   .. .. .. ..$ col : chr "Petal.Length"
#>   .. .. .. ..$ val : num 3.4
#>   .. .. .. ..$ op  : chr "more-equal"
#>   .. .. ..$ :List of 4
.... more