jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

R - XGBoost - RDS to PMML conversion error #74

Closed VictorP10 closed 1 year ago

VictorP10 commented 1 year ago

Hi, I have an issue for 2 months. It used to work perfectly and now I can't use r2pmml with any xgboost model. Here is the code (it is the iris example)

library("xgboost")
library("r2pmml")

data(iris)

iris_X = iris[, 1:4]
iris_y = as.integer(iris[, 5]) - 1

iris.matrix = model.matrix(~ . - 1, data = iris_X)

iris.DMatrix = xgb.DMatrix(iris.matrix, label = iris_y)
iris.fmap = as.fmap(iris.matrix)

iris.xgb = xgboost(data = iris.DMatrix, missing = NULL, objective = "multi:softmax", num_class = 3, nrounds = 13)

r2pmml(iris.xgb, "iris_xgb.pmml", fmap = iris.fmap, response_name = "Species", response_levels = c("setosa", "versicolor", "virginica"), missing = NULL, ntreelimit = 7, compact = TRUE)

My session infos are : R version 4.2.2 r2pmml_0.26.0
xgboost_1.7.3.1

And here is the result of the execution of the script :

mars 15, 2023 10:20:18 AM org.jpmml.rexp.Main run
INFOS: Parsing RDS..
mars 15, 2023 10:20:18 AM org.jpmml.rexp.Main run
INFOS: Parsed RDS in 21 ms.
mars 15, 2023 10:20:18 AM org.jpmml.rexp.Main run
INFOS: Initializing default Converter
mars 15, 2023 10:20:18 AM org.jpmml.rexp.Main run
INFOS: Initialized org.jpmml.rexp.XGBoostConverter
mars 15, 2023 10:20:18 AM org.jpmml.rexp.Main run
INFOS: Converting RDS to PMML..
mars 15, 2023 10:20:19 AM org.jpmml.rexp.Main run
GRAVE: Failed to convert RDS to PMML
com.google.gson.JsonSyntaxException: com.google.gson.stream.MalformedJsonException: Expected ':' at line 1 column 18 path $.L   at com.google.gson.internal.Streams.parse(Streams.java:60)
    at com.google.gson.JsonParser.parseReader(JsonParser.java:85)
    at com.google.gson.JsonParser.parseReader(JsonParser.java:60)
    at com.google.gson.JsonParser.parse(JsonParser.java:104)
    at org.jpmml.xgboost.Learner.loadJSON(Learner.java:217)
    at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:90)
    at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:60)
    at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:301)
    at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:238)
    at org.jpmml.rexp.XGBoostConverter.ensureLearner(XGBoostConverter.java:214)
    at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:81)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
    at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
    at org.jpmml.rexp.Main.run(Main.java:149)
    at org.jpmml.rexp.Main.main(Main.java:97)
Caused by: com.google.gson.stream.MalformedJsonException: Expected ':' at line 1 column 18 path $.L at com.google.gson.stream.JsonReader.syntaxError(JsonReader.java:1564)
    at com.google.gson.stream.JsonReader.doPeek(JsonReader.java:531)
    at com.google.gson.stream.JsonReader.peek(JsonReader.java:426)
    at com.google.gson.internal.bind.TypeAdapters$29.read(TypeAdapters.java:700)
    at com.google.gson.internal.bind.TypeAdapters$29.read(TypeAdapters.java:723)
    at com.google.gson.internal.bind.TypeAdapters$29.read(TypeAdapters.java:698)
    at com.google.gson.internal.Streams.parse(Streams.java:48)
    ... 14 more

Exception in thread "main" com.google.gson.JsonSyntaxException: com.google.gson.stream.MalformedJsonException: Expected ':' at line 1 column 18 path $.L    at com.google.gson.internal.Streams.parse(Streams.java:60)
    at com.google.gson.JsonParser.parseReader(JsonParser.java:85)
    at com.google.gson.JsonParser.parseReader(JsonParser.java:60)
    at com.google.gson.JsonParser.parse(JsonParser.java:104)
    at org.jpmml.xgboost.Learner.loadJSON(Learner.java:217)
    at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:90)
    at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:60)
    at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:301)
    at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:238)
    at org.jpmml.rexp.XGBoostConverter.ensureLearner(XGBoostConverter.java:214)
    at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:81)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
    at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
    at org.jpmml.rexp.Main.run(Main.java:149)
    at org.jpmml.rexp.Main.main(Main.java:97)
Caused by: com.google.gson.stream.MalformedJsonException: Expected ':' at line 1 column 18 path $.L at com.google.gson.stream.JsonReader.syntaxError(JsonReader.java:1564)
    at com.google.gson.stream.JsonReader.doPeek(JsonReader.java:531)
    at com.google.gson.stream.JsonReader.peek(JsonReader.java:426)
    at com.google.gson.internal.bind.TypeAdapters$29.read(TypeAdapters.java:700)
    at com.google.gson.internal.bind.TypeAdapters$29.read(TypeAdapters.java:723)
    at com.google.gson.internal.bind.TypeAdapters$29.read(TypeAdapters.java:698)
    at com.google.gson.internal.Streams.parse(Streams.java:48)
    ... 14 more
Error in .convert(tempfile, file, converter, converter_classpath, verbose) : 
  The JPMML-R conversion application has failed (error code 1). The Java executable should have printed more information about the failure into its standard output and/or standard error streams

Thanks in advance for your help !

vruusmann commented 1 year ago

r2pmml_0.26.0 xgboost_1.7.3.1

Caused by: com.google.gson.stream.MalformedJsonException: Expected ':' at line 1 column 18 path $.L

This error means that the XGBoost parser is expecting to read the model in JSON data format, but the RDS file contains the model in Universal Binary JSON (aka UBJSON, UBJ) data format instead.

This issue can be solved by:

  1. Instructing XGBoost to always dump the model in JSON data format.
  2. Upgrading the R2PMML package, so that it includes a newer JPMML-R and JPMML-XGBoost library versions, which support both JSON and UBJSON data formats.

r2pmml_0.26.0

This is the latest R2PMML package version, which was published to CRAN on 19th of March, 2021 (two years ago!). No wonder that it doesn't know about UBJSON.

In the R2PMML GitHub repository there are 0.26.1 and 0.26.2 versions available. But I'm not sure, if they already include a sufficiently new JPMML-XGBoost library or not.

vruusmann commented 1 year ago

TLDR: The R code example is perfectly valid. The problem is that the XGBoost library has changed its default data persistence format from JSON to UBJSON about a year ago (and a two year old R2PMML package doesn't know about it).

vruusmann commented 1 year ago

In the R2PMML GitHub repository there are 0.26.1 and 0.26.2 versions available.

R2PMML version 0.26.2 includes JPMML-XGBoost version 1.7.0, which knows about the UBJSON data format.

Therefore, if you upgrade your R2PMML package version as shown below, this error should go away:

library("devtools")

install_github("jpmml/r2pmml")

Even so, looks like a new R2PMML release version is warranted here.

VictorP10 commented 1 year ago

Seems that it works now ! Thanks a lot !

vruusmann commented 1 year ago

I've prepared R2PMML version 0.27.0, and submitted it for CRAN review queue. No idea if it gets accepted or rejected this time.

Anyway, the GitHub install will be always available.

vruusmann commented 1 year ago

I've prepared R2PMML version 0.27.0, and submitted it for CRAN review queue

The R2PMML version 0.27.0 didn't get approved, but the corrected 0.27.1 did!

So, in a day or two, there will be a new R2PMML package available in CRAN that works fine with newest 1.6.X and 1.7.X versions (UBJSON data format, categorical data).