jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

Error converting GLM with caret #19

Closed edumucelli closed 7 years ago

edumucelli commented 7 years ago

@vruusmann I am experiencing a similar error as presented on #8, but this time when using GLM with caret. Follow are the error, the reproducible code and the sessionInfo(). Maybe I am misusing r2pmml here with features that are specifically not supported. In this case, I'd be thankful if you could point me out which are them.

All the JPMML environment is an excellent tool, congratulations.

Apr 13, 2017 11:11:49 AM org.jpmml.rexp.Main run
INFO: Parsing RDS..
Apr 13, 2017 11:11:50 AM org.jpmml.rexp.Main run
SEVERE: Failed to parse RDS
java.lang.UnsupportedOperationException: 17
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:123)
    at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:153)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
    at org.jpmml.rexp.RExpParser.readEnvironment(RExpParser.java:193)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:78)
    at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:153)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
    at org.jpmml.rexp.RExpParser.readEnvironment(RExpParser.java:193)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:78)
    at org.jpmml.rexp.RExpParser.readTag(RExpParser.java:449)
    at org.jpmml.rexp.RExpParser.readPromise(RExpParser.java:204)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:80)
    at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:153)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
    at org.jpmml.rexp.RExpParser.readEnvironment(RExpParser.java:193)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:78)
    at org.jpmml.rexp.RExpParser.readTag(RExpParser.java:449)
    at org.jpmml.rexp.RExpParser.readPromise(RExpParser.java:204)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:80)
    at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:153)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
    at org.jpmml.rexp.RExpParser.readEnvironment(RExpParser.java:193)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:78)
    at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:153)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
    at org.jpmml.rexp.RExpParser.readAttributes(RExpParser.java:465)
    at org.jpmml.rexp.RExpParser.readAttributes(RExpParser.java:458)
    at org.jpmml.rexp.RExpParser.readFunctionCall(RExpParser.java:212)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:82)
    at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:153)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
    at org.jpmml.rexp.RExpParser.readAttributes(RExpParser.java:465)
    at org.jpmml.rexp.RExpParser.readAttributes(RExpParser.java:458)
    at org.jpmml.rexp.RExpParser.readVector(RExpParser.java:317)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:95)
    at org.jpmml.rexp.RExpParser.readVector(RExpParser.java:312)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:95)
    at org.jpmml.rexp.RExpParser.readVector(RExpParser.java:312)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:95)
    at org.jpmml.rexp.RExpParser.parse(RExpParser.java:53)
    at org.jpmml.rexp.Main.run(Main.java:109)
    at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.UnsupportedOperationException: 17
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:123)
    at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:153)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
    at org.jpmml.rexp.RExpParser.readEnvironment(RExpParser.java:193)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:78)
    at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:153)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
    at org.jpmml.rexp.RExpParser.readEnvironment(RExpParser.java:193)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:78)
    at org.jpmml.rexp.RExpParser.readTag(RExpParser.java:449)
    at org.jpmml.rexp.RExpParser.readPromise(RExpParser.java:204)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:80)
    at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:153)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
    at org.jpmml.rexp.RExpParser.readEnvironment(RExpParser.java:193)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:78)
    at org.jpmml.rexp.RExpParser.readTag(RExpParser.java:449)
    at org.jpmml.rexp.RExpParser.readPromise(RExpParser.java:204)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:80)
    at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:153)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
    at org.jpmml.rexp.RExpParser.readEnvironment(RExpParser.java:193)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:78)
    at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:153)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
    at org.jpmml.rexp.RExpParser.readAttributes(RExpParser.java:465)
    at org.jpmml.rexp.RExpParser.readAttributes(RExpParser.java:458)
    at org.jpmml.rexp.RExpParser.readFunctionCall(RExpParser.java:212)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:82)
    at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:153)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
    at org.jpmml.rexp.RExpParser.readAttributes(RExpParser.java:465)
    at org.jpmml.rexp.RExpParser.readAttributes(RExpParser.java:458)
    at org.jpmml.rexp.RExpParser.readVector(RExpParser.java:317)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:95)
    at org.jpmml.rexp.RExpParser.readVector(RExpParser.java:312)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:95)
    at org.jpmml.rexp.RExpParser.readVector(RExpParser.java:312)
    at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:95)
    at org.jpmml.rexp.RExpParser.parse(RExpParser.java:53)
    at org.jpmml.rexp.Main.run(Main.java:109)
    at org.jpmml.rexp.Main.main(Main.java:97)
Error in .convert(tempfile, file, ...) : 1
library(caret)
library("r2pmml")

NROW = 2000
NCOL = 5

data = data.frame(CLASS = replicate(1, sample(c('true', 'false'), replace=TRUE)), replicate(NCOL, sample(rnorm(NROW), replace=TRUE)))

trainIndex <- createDataPartition(data$CLASS, p = .75, list = FALSE, times = 1)

train <- data[ trainIndex,]
test  <- data[-trainIndex,]

method = "glm"

crtl = trainControl(method = "repeatedcv", 
                    number = 10, 
                    repeats = 3, 
                    classProbs = TRUE, 
                    savePredictions = "final", 
                    summaryFunction = twoClassSummary,
                    returnData = FALSE)
tune_length = 3
model_fit = train(CLASS ~ .,
                   data = train,  
                   method = "glm", 
                   trControl = crtl,
                   tuneLength = tune_length,
                   metric = "ROC")

r2pmml(model_fit, paste0("glm.benchmark.", NCOL, ".pmml"))
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] r2pmml_0.12.3   caret_6.0-73    ggplot2_2.2.1   lattice_0.20-34

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.9        magrittr_1.5       splines_3.3.2      MASS_7.3-45       
 [5] munsell_0.4.3      colorspace_1.3-2   foreach_1.4.3      minqa_1.2.4       
 [9] stringr_1.2.0      car_2.1-4          plyr_1.8.4         tools_3.3.2       
[13] parallel_3.3.2     nnet_7.3-12        pbkrtest_0.4-6     grid_3.3.2        
[17] gtable_0.2.0       nlme_3.1-131       mgcv_1.8-17        quantreg_5.29     
[21] MatrixModels_0.4-1 iterators_1.0.8    lme4_1.1-12        lazyeval_0.2.0    
[25] assertthat_0.1     tibble_1.2         Matrix_1.2-8       nloptr_1.0.4      
[29] reshape2_1.4.2     ModelMetrics_1.1.0 codetools_0.2-15   stringi_1.1.2     
[33] compiler_3.3.2     scales_0.4.1       stats4_3.3.2       SparseM_1.74  
vruusmann commented 7 years ago

@edumucelli Thanks for providing a reproducible example. However, I have a slight problem running it in my computer, as the train() function complains about some bad arguments:

> model_fit = train(CLASS ~ ., data = train, method = "glm", trControl = crtl, tuneLength = tune_length, metric = "ROC")
Error in apply(testOutput[, lev], 1, function(x) x/sum(x)) : 
  dim(X) must have a positive length

Can you figure out what's causing it?

The opcode 17 corresponds to a "dot-dot-dot" object. My RDS parser doesn't recognize this opcode yet. However, when needed, it shouldn't be difficult to implement.

As a temporary workaround, can you execute the r2pmml() function with just the fitted glm() object, which is accessible as model_fit$finalModel?

r2pmml(model_fit$finalModel, ...)

This way you won't be serializing the whole train() object, which means that there's a good chance that the temporary RDS file won't contain any "dot-dot-dot" objects, and the conversion should succeed.

edumucelli commented 7 years ago

@vruusmann thank you for the quick response. Thanks for the further info regarding the problematic of the "...". Please find bellow the fixed version of the reproducible example. Unfortunately the error is the same. Let me know if you want me to test something else.

library(caret)
library("r2pmml")

NROW = 2000
NCOL = 5

data = data.frame(CLASS = replicate(1, sample(c('true', 'false'), replace=TRUE)), replicate(NCOL, sample(rnorm(NROW), replace=TRUE)))

trainIndex <- createDataPartition(data$CLASS, p = .75, times = 1, list = FALSE)

train <- data[ trainIndex,]
test  <- data[-trainIndex,]

method = "glm"

crtl = trainControl(method = "repeatedcv", 
                    number = 10, 
                    repeats = 1,
                    verboseIter = TRUE,
                    returnData = FALSE)
tune_length = 3
method_fit = train(CLASS ~.,
                   data = data,  
                   method = method, 
                   trControl = crtl,
                   tuneLength = tune_length)

r2pmml(method_fit, paste0("glm.benchmark.", NCOL, ".pmml"))

# Unfortunately using finalModel did not worked either 
r2pmml(method_fit$finalModel, paste0("glm.benchmark.", NCOL, ".pmml"))