jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

Problem exporting xgboost model #66

Closed fwendler closed 3 years ago

fwendler commented 3 years ago

First of all thank you for this great library!

I have problems exporting xgboost models which also occurs in the example from the readme:

> library("xgboost")
> library("r2pmml")
> data(iris)
> 
> iris_X = iris[, 1:4]
> iris_y = as.integer(iris[, 5]) - 1
> 
> # Generate R model matrix
> iris.matrix = model.matrix(~ . - 1, data = iris_X)
> 
> # Generate XGBoost DMatrix and feature map based on R model matrix
> iris.DMatrix = xgb.DMatrix(iris.matrix, label = iris_y)
> iris.fmap = as.fmap(iris.matrix)
> # Train a model
> iris.xgb = xgboost(data = iris.DMatrix, missing = NULL, objective = "multi:softmax", num_class = 3, nrounds = 13)
[1] train-merror:0.020000 
[2] train-merror:0.026667 
[3] train-merror:0.020000 
[4] train-merror:0.020000 
[5] train-merror:0.013333 
[6] train-merror:0.013333 
[7] train-merror:0.013333 
[8] train-merror:0.013333 
[9] train-merror:0.013333 
[10]    train-merror:0.013333 
[11]    train-merror:0.006667 
[12]    train-merror:0.006667 
[13]    train-merror:0.006667 
> 
> # Export the model to PMML.
> # Pass the feature map as the `fmap` argument.
> # Pass the name and category levels of the target field as `response_name` and `response_levels` arguments, respectively.
> # Pass the value of missing value as the `missing` argument
> # Pass the optimal number of trees as the `ntreelimit` argument (analogous to the `ntreelimit` argument of the `xgb::predict.xgb.Booster` function)
> r2pmml(iris.xgb, "/tmp/iris_xgb.pmml", fmap = iris.fmap, response_name = "Species", response_levels = c("setosa", "versicolor", "virginica"), missing = NULL, ntreelimit = 7, compact = TRUE)
Dec 02, 2020 11:18:00 AM org.jpmml.rexp.Main run
INFO: Parsing RDS..
Dec 02, 2020 11:18:00 AM org.jpmml.rexp.Main run
INFO: Parsed RDS in 7 ms.
Dec 02, 2020 11:18:00 AM org.jpmml.rexp.Main run
INFO: Initializing default Converter
Dec 02, 2020 11:18:00 AM org.jpmml.rexp.Main run
INFO: Initialized org.jpmml.rexp.XGBoostConverter
Dec 02, 2020 11:18:00 AM org.jpmml.rexp.Main run
INFO: Converting RDS to PMML..
Dec 02, 2020 11:18:00 AM org.jpmml.rexp.Main run
SEVERE: Failed to convert RDS to PMML
java.lang.ClassCastException: class org.jpmml.rexp.RStringVector cannot be cast to class org.jpmml.rexp.RIntegerVector (org.jpmml.rexp.RStringVector and org.jpmml.rexp.RIntegerVector are in unnamed module of loader 'app')
    at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:283)
    at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:265)
    at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:230)
    at org.jpmml.rexp.XGBoostConverter.ensureFeatureMap(XGBoostConverter.java:209)
    at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:66)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
    at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
    at org.jpmml.rexp.Main.run(Main.java:149)
    at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.ClassCastException: class org.jpmml.rexp.RStringVector cannot be cast to class org.jpmml.rexp.RIntegerVector (org.jpmml.rexp.RStringVector and org.jpmml.rexp.RIntegerVector are in unnamed module of loader 'app')
    at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:283)
    at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:265)
    at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:230)
    at org.jpmml.rexp.XGBoostConverter.ensureFeatureMap(XGBoostConverter.java:209)
    at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:66)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
    at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
    at org.jpmml.rexp.Main.run(Main.java:149)
    at org.jpmml.rexp.Main.main(Main.java:97)
Error in .convert(tempfile, file, converter, converter_classpath, verbose) : 
  The JPMML-R conversion application has failed (error code 1). The Java executable should have printed more information about the failure into its standard output and/or standard error streams
>
vruusmann commented 3 years ago

What's your relevant sessionInfo()?

My session info is as follows, and the sample script that you provided works without errors:

R version 3.3.1 (2016-06-21)

other attached packages:
[1] r2pmml_0.24.2   xgboost_1.1.0.1
vruusmann commented 3 years ago

Also works with R version 3.5.2.

I don't have R version 4 installed at the moment.

fwendler commented 3 years ago

Wow, you are fast 🥇 I‘m using R 4.0.3 and xgboost_1.2.0.1, r2pmml_0.25.0

vruusmann commented 3 years ago

OK. Looks like I'd need to upgrade to R version 4.0.X then, and re-run the experiment.

If you have a functioning R version 3.X installation laying around, then you may test if the sample script succeeds there.

vruusmann commented 3 years ago

I'm running R version 4.0.3 now (with latest XGBoost and R2PMML packages), and I'm seeing the same error now.

vruusmann commented 3 years ago

Explanation of the error - the converter expects that feature map columns are factors, but it's getting characters.

The quick workaround would be to correct feature map column types manually:

iris.fmap = as.fmap(iris.matrix)
iris.fmap$name = as.factor(iris.fmap$name)
iris.fmap$type = as.factor(iris.fmap$type)

Looks like a breaking change between R version 3 and 4 platforms.

The correct fix would be to make the converter accept both column types.

fwendler commented 3 years ago

Fantastic, thank you very much for the super fast response and workaround!

In the meantime I was able to confirm that it works with R 3.6.3 (with the same versions of xgboost and r2pmml that I mentioned above).