jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

Failing to convert when using r2pmml and ranger #28

Closed phakksi closed 6 years ago

phakksi commented 6 years ago

Hi,

I am not sure if anything changed in the latest ranger implementation, but I cannot even execute the example code for the iris dataset:

library("ranger")
library("r2pmml")
data(iris)

iris.ranger = ranger(Species ~ ., data = iris, num.trees = 7, write.forest = TRUE)
print(iris.ranger)

r2pmml(iris.ranger, "iris_ranger.pmml", dataset = iris)

I get the following exception:

Sep 05, 2017 11:57:41 AM org.jpmml.rexp.Main run
INFO: Parsing RDS..
Sep 05, 2017 11:57:41 AM org.jpmml.rexp.Main run
INFO: Parsed RDS in 6 ms.
Sep 05, 2017 11:57:41 AM org.jpmml.rexp.Main run
INFO: Initializing default Converter
Sep 05, 2017 11:57:41 AM org.jpmml.rexp.Main run
INFO: Initialized org.jpmml.rexp.RangerConverter
Sep 05, 2017 11:57:41 AM org.jpmml.rexp.Main run
INFO: Converting..
Sep 05, 2017 11:57:41 AM org.jpmml.rexp.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Sepal.Length
    at org.jpmml.rexp.RVector.getValue(RVector.java:96)
    at org.jpmml.rexp.RVector.getValue(RVector.java:72)
    at org.jpmml.rexp.RangerConverter.encodeSchema(RangerConverter.java:117)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:74)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
    at org.jpmml.rexp.Main.run(Main.java:149)
    at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.IllegalArgumentException: Sepal.Length
    at org.jpmml.rexp.RVector.getValue(RVector.java:96)
    at org.jpmml.rexp.RVector.getValue(RVector.java:72)
    at org.jpmml.rexp.RangerConverter.encodeSchema(RangerConverter.java:117)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:74)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
    at org.jpmml.rexp.Main.run(Main.java:149)
    at org.jpmml.rexp.Main.main(Main.java:97)
 Show Traceback
Error in .convert(tempfile, file, converter, converter_classpath, verbose) : 1

Do you know what could be causing this issue?
I am using R version 3.4.1 (2017-06-30).

vruusmann commented 6 years ago

Do you know what could be causing this issue? I am using R version 3.4.1 (2017-06-30).

Could be the case that the model data structure has been changed in newer ranger package versions.

What is your ranger package version? The integration tests of the JPMML-R library were generated/tested using ranger 0.7.0: https://github.com/jpmml/jpmml-r/blob/master/src/test/R/ranger.R

phakksi commented 6 years ago

Hi Vruusmann,

The version I'm using is 0.8.0:

> packageVersion("ranger")
[1] ‘0.8.0’

I will try to install version 0.7.0 to see if I can save my current model.

Do you think this issue can be fixed in the near future?

Thanks for your fast answer.

phakksi commented 6 years ago

Same error with ranger 0.7.0:

Sep 05, 2017 3:42:27 PM org.jpmml.rexp.Main run
INFO: Parsing RDS..
Sep 05, 2017 3:42:27 PM org.jpmml.rexp.Main run
INFO: Parsed RDS in 6 ms.
Sep 05, 2017 3:42:27 PM org.jpmml.rexp.Main run
INFO: Initializing default Converter
Sep 05, 2017 3:42:27 PM org.jpmml.rexp.Main run
INFO: Initialized org.jpmml.rexp.RangerConverter
Sep 05, 2017 3:42:27 PM org.jpmml.rexp.Main run
INFO: Converting..
Sep 05, 2017 3:42:27 PM org.jpmml.rexp.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Sepal.Length
    at org.jpmml.rexp.RVector.getValue(RVector.java:96)
    at org.jpmml.rexp.RVector.getValue(RVector.java:72)
    at org.jpmml.rexp.RangerConverter.encodeSchema(RangerConverter.java:117)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:74)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
    at org.jpmml.rexp.Main.run(Main.java:149)
    at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.IllegalArgumentException: Sepal.Length
    at org.jpmml.rexp.RVector.getValue(RVector.java:96)
    at org.jpmml.rexp.RVector.getValue(RVector.java:72)
    at org.jpmml.rexp.RangerConverter.encodeSchema(RangerConverter.java:117)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:74)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
    at org.jpmml.rexp.Main.run(Main.java:149)
    at org.jpmml.rexp.Main.main(Main.java:97)
Error in .convert(tempfile, file, converter, converter_classpath, verbose) : 
  1
> packageVersion("ranger")
[1] ‘0.7.0’
vruusmann commented 6 years ago

Looks like a minor bug in the JPMML-R library. On line 117, the RVector#getValue(String) function call should be "guarded" by an RVector#hasValue(String) function call - the variable.levels attribute is not guaranteed to contain info about all variables.

This problem went unnoticed, because JPMML-R library integration test RDS files are currently generated manually. But they should be generated using the r2pmml::decorate(x, dataset) function to ensure consistency with what the end user is experiencing.

vruusmann commented 6 years ago

The above comment references this line: https://github.com/jpmml/jpmml-r/blob/master/src/main/java/org/jpmml/rexp/RangerConverter.java#L117

vruusmann commented 6 years ago

As a quick workaround, you should generate a fully populated variable.levels attribute manually:

iris.ranger = ranger(Species ~ ., data = iris, num.trees = 7, write.forest = TRUE)
print(iris.ranger)

iris.ranger$variable.levels = lapply(iris, function(x){ if(is.factor(x)) { levels(x) } else { NULL }}) # THIS!

r2pmml(iris.ranger, "iris_ranger.pmml", dataset = iris, verbose = TRUE)
phakksi commented 6 years ago

That works just fine for my specific case! Thanks.

vruusmann commented 6 years ago

Reopening, because the JPMML-R library is still broken.

codemasta14 commented 6 years ago

This is still broken. I've tried the above mentioned solutions and am getting an error: Error in .convert(tempfile, file, converter, converter_classpath, verbose) : 127

I tried with the same iris example used above, and the workaround and got the same error. Was this fixed and broke again? Or was it never fixed?

vruusmann commented 6 years ago

@codemasta14 This issue was fixed in jpmml/jpmml-r@33fa655 - see the above log message(s).

Your issue must be something different. But it's impossible to be any more specific until you've shared the full stack trace of the Java exception.

codemasta14 commented 6 years ago

@vruusmann I believe this is what you want correct? I've included my code I ran. I appreciate your help!

>iris.rf <-  randomForest(Species ~ ., data = iris, ntree = 7)
>r2pmml(iris.rf, "./iris_rf.pmml")

>traceback()

6: stop(result)
5: .convert(tempfile, file, converter, converter_classpath, verbose)
4: main()
3: tryCatchList(expr, classes, parentenv, handlers)
2: tryCatch({
       main()
   }, finally = {
       unlink(tempfile)
   })
1: r2pmml(iris.rf, "iris_rf.pmml")
vruusmann commented 6 years ago

@codemasta14 You've provided R-side stack trace. However, I'm looking for the Java-side stack trace, which triggered this stop(result) action.

You appear to be testing the sample script from project's README file - it deals with randomForest() not ranger(), so your issue is definitely not related to this one (ie. issue #28).

Must be some very basic setup problem. Perhaps you don't have java.exe available on system path?

codemasta14 commented 6 years ago

@vruusmann I will take a look. It's very possible that that is it. Thank you, and I'm sorry it's probably something so silly!

codemasta14 commented 6 years ago

@vruusmann That definitely was not the problem. I get the exact same error running the ranger, as well as the random forest objects. I would show you the java trace, but I actually don't know how to do that. I really only know how to program in R and python. I'll figure something out to fix my error or work around it though. Thank you so much for responding to me, you absolutely didn't have to, so it means a lot.

vruusmann commented 6 years ago

I would show you the java trace, but I actually don't know how to do that.

@codemasta14 You don't need to do anything - the Java stack trace should be printed to the R console (or RStudio output window) automatically. See the opening message of this issue - the r2pmml function should print out a list of "INFO" messages, followed by a "SEVERE" message. I need this stuff that comes after the "SEVERE" message.

Alternatively, save your model to a RDS file using the saveRDS(x, path) function, and then try the command-line JPMML-R application: https://github.com/jpmml/jpmml-r

codemasta14 commented 6 years ago

@vruusmann I'm not getting any java trace. I looked through the source code, and it looks like it's failing on line 9 of the function.

screenshot 4

I will look into the alternative method you've given me. I'm using R version 3.50. I also tried uninstalling and reinstalling the package.

EwelinaEwelina commented 6 years ago

Hello! It's not working for me for neural networks: mynn <- nnet(Churn ~ ., data=trainNN, size=3, decay=1.0e-5, maxit=50, softmax = TRUE) mynn$variable.levels <- lapply(trainNN, function(x){ if(is.factor(x)) { levels(x) } else { NULL }})

After: r2pmml(mynn,"churn_nnet_pmml.xml", verbose = TRUE, variable.levels = mynn$variable.levels)

I have an error: Error in decorate.default(x, ...) : unused argument (variable.levels = list(X.area.code.408 = NULL, X.area.code.510 = NULL, X.international.plan.1 = NULL, X.number.vmail.messages. = NULL, X.total.day.charge. = NULL, Churn = NULL)) What have I done wrong?

korczykowska commented 5 years ago

Hello! I had problem with random forest model where independent variable are factor and had "" as one of the levels. For example: "" "A city" "B small" "C compact" "D family" "E executive"
I trained two models: one with all observations and secend without observations, for which I have "" as a value of independent variable. r2pmml doesn't work for first model (the same error message as in pedrodia on 5 Sep 2017), but it does for second.

Does anybody know explanation why?

vruusmann commented 5 years ago

Does anybody know explanation why?

That's a completely arbitrary sanity check.

It does not seem like a good idea to use empty strings as valid category values. Suppose that there's something wrong with the model and you need to debug its schema, and then you decide to print some debugging data to the console, and nothing appears there. This stuff is totally confusing, and has a potential to waste days of productive time.

Why don't you replace empty strings with "(empty)" strings instead?

korczykowska commented 5 years ago

Does anybody know explanation why?

That's a completely arbitrary sanity check.

It does not seem like a good idea to use empty strings as valid category values. Suppose that there's something wrong with the model and you need to debug its schema, and then you decide to print some debugging data to the console, and nothing appears there. This stuff is totally confusing, and has a potential to waste days of productive time.

Why don't you replace empty strings with "(empty)" strings instead?

That doesn't work.... I have added new level for factor like 'NONE' (no more missing values or "" in dataset) but the problem remains: I cannot generate pmml file.

vruusmann commented 5 years ago

but the problem remains: I cannot generate pmml file.

What's the technical problem? You should create a reproducible example (R script plus some data), and open a new R2PMML issue.