jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

Failed to convert randomForest #20

Closed quasipolynomial closed 7 years ago

quasipolynomial commented 7 years ago

Hi there,

I am receiving the following error when trying to convert a randomForest generated using "matrix view." From what I understand, pmml doesn't support this feature. However, I was hoping to shed more light on an "IllegalArgumentException" as such:

rf_output=randomForest(x=data, y=target, importance = TRUE, ntree = 101, proximity=TRUE)
Apr 20, 2017 10:06:12 PM org.jpmml.rexp.Main run
INFO: Parsing RDS..
Apr 20, 2017 10:06:13 PM org.jpmml.rexp.Main run
INFO: Parsed RDS in 105 ms.
Apr 20, 2017 10:06:13 PM org.jpmml.rexp.Main run
INFO: Initializing default Converter
Apr 20, 2017 10:06:13 PM org.jpmml.rexp.Main run
INFO: Initialized org.jpmml.rexp.RandomForestConverter
Apr 20, 2017 10:06:13 PM org.jpmml.rexp.Main run
INFO: Converting..
Apr 20, 2017 10:06:13 PM org.jpmml.rexp.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: names
    at org.jpmml.rexp.RExp.getAttributeValue(RExp.java:59)
    at org.jpmml.rexp.RExp.getAttributeValue(RExp.java:46)
    at org.jpmml.rexp.RExp.names(RExp.java:32)
    at org.jpmml.rexp.RandomForestConverter.encodeNonFormula(RandomForestConverter.java:168)
    at org.jpmml.rexp.RandomForestConverter.encodeSchema(RandomForestConverter.java:67)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:74)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
    at org.jpmml.rexp.Main.run(Main.java:149)
    at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.IllegalArgumentException: names
    at org.jpmml.rexp.RExp.getAttributeValue(RExp.java:59)
    at org.jpmml.rexp.RExp.getAttributeValue(RExp.java:46)
    at org.jpmml.rexp.RExp.names(RExp.java:32)
    at org.jpmml.rexp.RandomForestConverter.encodeNonFormula(RandomForestConverter.java:168)
    at org.jpmml.rexp.RandomForestConverter.encodeSchema(RandomForestConverter.java:67)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:74)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
    at org.jpmml.rexp.Main.run(Main.java:149)
    at org.jpmml.rexp.Main.main(Main.java:97)
Error in .convert(tempfile, file, ...) : 1
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.10

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rJava_0.9-8           r2pmml_0.13.0         fBasics_3011.87       timeSeries_3022.101.2 timeDate_3012.100    
 [6] heatmap.plus_1.3      mclust_5.2.3          genefilter_1.56.0     Hmisc_4.0-2           ggplot2_2.2.1        
[11] Formula_1.2-1         survival_2.41-3       lattice_0.20-33       ROCR_1.0-7            gplots_3.0.1         
[16] randomForest_4.6-12  

loaded via a namespace (and not attached):
 [1] gtools_3.5.0         splines_3.3.1        colorspace_1.3-2     htmltools_0.3.5      stats4_3.3.1        
 [6] base64enc_0.1-3      XML_3.98-1.4         foreign_0.8-66       DBI_0.6-1            BiocGenerics_0.20.0 
[11] RColorBrewer_1.1-2   plyr_1.8.4           stringr_1.2.0        munsell_0.4.3        gtable_0.2.0        
[16] caTools_1.17.1       htmlwidgets_0.8      memoise_1.0.0        latticeExtra_0.6-28  Biobase_2.34.0      
[21] knitr_1.15.1         IRanges_2.8.2        parallel_3.3.1       AnnotationDbi_1.36.2 htmlTable_1.9       
[26] Rcpp_0.12.10         acepack_1.4.1        KernSmooth_2.23-15   xtable_1.8-2         backports_1.0.5     
[31] scales_0.4.1         checkmate_1.8.2      gdata_2.17.0         S4Vectors_0.12.2     annotate_1.52.1     
[36] gridExtra_2.2.1      digest_0.6.12        stringi_1.1.5        grid_3.3.1           tools_3.3.1         
[41] bitops_1.0-6         magrittr_1.5         lazyeval_0.2.0       RCurl_1.95-4.8       tibble_1.3.0        
[46] RSQLite_1.1-2        cluster_2.0.4        Matrix_1.2-6         data.table_1.10.4    rpart_4.1-10        
[51] nnet_7.3-12 

Thanks,

K

vruusmann commented 7 years ago

The application logic fails when attempting to identify column names based on randomForest data structure:

xcols = rf$xnames
if(is.null(xcols)){
  xcols = names(rf$forest$xlevels) # THIS
}

The "matrix view" has been tested when x and y arguments are both data.frame. For example: https://github.com/jpmml/jpmml-r/blob/master/src/test/R/randomForest.R#L40 https://github.com/jpmml/jpmml-r/blob/master/src/test/R/randomForest.R#L65

Could it be that you're using "raw" matrices instead of data frames in your script? Specifically, what is the data type of your data variable?

print(class(data))
quasipolynomial commented 7 years ago

Amazing. It was precisely that issue.

> print(class(data))
[1] "matrix"
> data = as.data.frame(data)

Conversion works fine now.

All the best,

K