jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

XGB problem with the tutorial example #73

Closed ThomasCosyn closed 1 year ago

ThomasCosyn commented 2 years ago

Hello,

First of all, thanks for this wonderful package !

However I'm facing a complication when trying to run the code of the example for xgb. Here are my session infos :

Here is the code I run :

data(iris)
iris_X = iris[, 1:4]
iris_y = as.integer(iris[, 5]) - 1
iris.matrix = model.matrix(~ . - 1, data = iris_X)
iris.DMatrix = xgb.DMatrix(iris.matrix, label = iris_y)
iris.fmap = r2pmml::as.fmap(iris.matrix)
iris.fmap$name = as.factor(iris.fmap$name)
iris.fmap$type = as.factor(iris.fmap$type)
iris.xgb = xgboost(data = iris.DMatrix, missing = NULL, objective = "multi:softmax", num_class = 3, nrounds = 13)
r2pmml::r2pmml(iris.xgb, "iris_xgb.pmml", fmap = iris.fmap, response_name = "Species", response_levels = c("setosa", "versicolor", "virginica"), missing = NULL, ntreelimit = 7, compact = TRUE)

And here is the error message I get :

SEVERE: Failed to convert RDS to PMML
com.google.gson.JsonSyntaxException: com.google.gson.stream.MalformedJsonException: Expected ':' at line 1 column 18 path $.LException in thread "main" com.google.gson.JsonSyntaxException: com.google.gson.stream.MalformedJsonException: Expected ':' at line 1 column 18 path $.L
vruusmann commented 2 years ago

r2pmml : 0.26.0 XGB version : 1.6.0.1

You were installing the r2pmml package from CRAN, right?

The 0.26.0 version was released in March 2021, when the XGBoost library was somewhere between 1.4 and 1.5 versions. It cannot possibly know how to deal with XGBoost 1.6-style models, which rely on JSON/Binary JSON model formats.

A quick fix: In your R script, dump the fitted model in RDS data format. Then, use the JPMML-R command-line application to convert the RDS to PMML, as explained here: https://github.com/jpmml/jpmml-r#usage

vruusmann commented 2 years ago

Then, use the JPMML-R command-line application to convert the RDS to PMML

While doing so, please consider building a JPMML-R snapshot version off the latest GitHub HEAD revision, as explained here: https://github.com/jpmml/jpmml-r#installation

The latest JPMML-R release version is currently 1.5.0, which knows about XGBoost 1.5.X (JSON), but doesn't know much about XGBoost 1.6.X (Binary JSON).

ThomasCosyn commented 2 years ago

Thanks very much for your answer.

I tried using the last snapshot but it didn't work. So I just changed my xgboost version to 1.5.0 and it worked fine. Is there a documentation where we can know which algorithms' versions are supported ?

Thanks again !

vruusmann commented 2 years ago

I tried using the last snapshot but it didn't work.

Strange. Your original exception is raised, because the JSON parser is expecting "text JSON", but it finds "Universal Binary JSON (UBJ)" instead.

The GitHub head contains JPMML-XGBoost version 1.7.0, which added UBJ support: https://github.com/jpmml/jpmml-r/commit/9473c02b55862693875a66a1bface97f5c306a94

Looking into the above commit now - I did update the JAR version, and re-run all integration tests cleanly, but I didn't re-generate integration tests from scratch with XGBoost 1.6.X.

Will do it, and close this issue when done.

Is there a documentation where we can know which algorithms' versions are supported ?

You need to inspect which version of the JPMML-XGBoost library is shipped with your R2PMML/JPMML-R version.

Then, move over to the JPMML-XGBoost repository, and check out its commit log.

There is no "human friendly documentation" right now. Maybe it should be added - the XGBoost framework has undergone massive model data format changes lately, so it would be nice to have a concise version compatibility matrix available.

Current state (off the top of my head): 1) XGBoost 1.6.X (UBJ model data format, enhanced categorical features support) - JPMML-XGBoost 1.7.X or newer. 2) XGBoost 1.5.X (one-hot-encoded categorical features) - JPMML-XGBoost 1.6.X or newer. 3) XGBoost 1.3.X and 1.4.X (JSON model data format) - JPMML-XGBoost 1.5.X and newer.

Please note that the latest JPMML-XGBoost 1.7.X version should be able to consume all earlier XGBoost model formats (anything between 0.6 and 1.6).

vruusmann commented 1 year ago

Just released R2PMML version 0.27.1 to CRAN, which supports newer XGBoost versions 1.6.X and 1.7.X