jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

XGboost converstion issue #68

Closed Kawalierus closed 3 years ago

Kawalierus commented 3 years ago

I have tried recreating example as provided in example (https://github.com/jpmml/r2pmml#package-xgboost). Unfortunately I receive error message :

SEVERE: Failed to convert RDS to PMML java.lang.ClassCastException: org.jpmml.rexp.RStringVector cannot be cast to org.jpmml.rexp.RIntegerVector at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:283) at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:265) at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:230) at org.jpmml.rexp.XGBoostConverter.ensureFeatureMap(XGBoostConverter.java:209) at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:66) at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70) at org.jpmml.rexp.Converter.encodePMML(Converter.java:39) at org.jpmml.rexp.Main.run(Main.java:149) at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.ClassCastException: org.jpmml.rexp.RStringVector cannot be cast to org.jpmml.rexp.RIntegerVector at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:283) at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:265) at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:230) at org.jpmml.rexp.XGBoostConverter.ensureFeatureMap(XGBoostConverter.java:209) at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:66) at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70) at org.jpmml.rexp.Converter.encodePMML(Converter.java:39) at org.jpmml.rexp.Main.run(Main.java:149) at org.jpmml.rexp.Main.main(Main.java:97) I have notice that in the similar topic (https://www.gitmemory.com/issue/jpmml/r2pmml/64/718139304) you suggested that label should be factorized. Nevertheless it would cause error when providing label as factor to xgboost function [17:10:50] amalgamation/../src/objective/multiclass_obj.cu:120: SoftmaxMultiClassObj: label must be in [0, num_class).

I have initially tried to run conversion on poisson count model and obtained identical error (and prediction there is frequency so I do not see where strings are to be converted to integers).

R version: 4.0.3 Xgboost version: 1.3.2.1 r2pmml version: 0.25.1

I would greatly appreciate support regarding this issue.

vruusmann commented 3 years ago

java.lang.ClassCastException: org.jpmml.rexp.RStringVector cannot be cast to org.jpmml.rexp.RIntegerVector at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:283)

This is the line 183 of XGBoostConverter source code: https://github.com/jpmml/jpmml-r/blob/1.4.2/src/main/java/org/jpmml/rexp/XGBoostConverter.java#L283

As you can see, the converter expects that the name attribute of the Feature Map (aka FMap) object is a R factor. You've been giving it a R string instead.

To fix resolve this conversion error, simply change the type of the name attribute to R factor:

iris.fmap = as.fmap(iris.matrix)

# THIS!
iris.fmap$name = as.factor(iris.fmap$name)

https://github.com/jpmml/r2pmml/issues/64

These two issues are close relatives, because they both reflect inproper typing of FMap attributes (name here, type there).

All FMap attributes must be R factors.

vruusmann commented 3 years ago

All FMap attributes must be R factors.

Leaving this issue open for now - the r2pmml::as.fmap utility function should always force-cast all FMap attributes to R factors before returning the result to the end user.

Kawalierus commented 3 years ago

Thank you very much for such a quick response.

Unfortunately your advice have not resolved the issue in my case. Please note the code as below:

library("xgboost") library("r2pmml")

data(iris)

iris_X = iris[, 1:4] iris_y = as.integer(iris[, 5]) - 1

iris.matrix = model.matrix(~ . - 1, data = iris_X)

iris.DMatrix = xgb.DMatrix(iris.matrix, label = iris_y) iris.fmap = as.fmap(iris.matrix) iris.fmap$name = as.factor(iris.fmap$name) iris.xgb = xgboost(data = iris.DMatrix, missing = NULL, objective = "multi:softmax", num_class = 3, nrounds = 13)

r2pmml(iris.xgb, "iris_xgb.pmml", fmap = iris.fmap, response_name = "Species", response_levels = c("setosa", "versicolor", "virginica"), missing = NULL, ntreelimit = 7, compact = TRUE)

still results in the error:

SEVERE: Failed to convert RDS to PMML java.lang.ClassCastException: org.jpmml.rexp.RStringVector cannot be cast to org.jpmml.rexp.RIntegerVector at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:284) at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:265) at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:230) at org.jpmml.rexp.XGBoostConverter.ensureFeatureMap(XGBoostConverter.java:209) at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:66) at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70) at org.jpmml.rexp.Converter.encodePMML(Converter.java:39) at org.jpmml.rexp.Main.run(Main.java:149) at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.ClassCastException: org.jpmml.rexp.RStringVector cannot be cast to org.jpmml.rexp.RIntegerVector at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:284) at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:265) at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:230) at org.jpmml.rexp.XGBoostConverter.ensureFeatureMap(XGBoostConverter.java:209) at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:66) at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70) at org.jpmml.rexp.Converter.encodePMML(Converter.java:39) at org.jpmml.rexp.Main.run(Main.java:149) at org.jpmml.rexp.Main.main(Main.java:97)

vruusmann commented 3 years ago

Exception in thread "main" java.lang.ClassCastException: org.jpmml.rexp.RStringVector cannot be cast to org.jpmml.rexp.RIntegerVector at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:284)

See - you fixed the type of the name attribute, and now the exception is happening one line later (284 instead of 283)

Please see my earlier resolution - "All FMap attributes must be R factors"

iris.fmap = as.fmap(iris.matrix)

iris.fmap$id = as.factor(iris.fmap$id)
iris.fmap$name = as.factor(iris.fmap$name)
iris.fmap$type = as.factor(iris.fmap$type)
vruusmann commented 3 years ago

My XGBoost example works fine with R 3.X.

Looks like R 4.X has botched matrix column types (were factors before, are strings now).

Kawalierus commented 3 years ago

Thank you very much again. Apologies for not noticing this at first glance. We are making steady progress as I have reached another type of error by applying all your corrections.

Code: library("xgboost") library("r2pmml")

data(iris)

iris_X = iris[, 1:4] iris_y = as.integer(iris[, 5]) - 1

iris.matrix = model.matrix(~ . - 1, data = iris_X)

iris.DMatrix = xgb.DMatrix(iris.matrix, label = iris_y) iris.fmap = as.fmap(iris.matrix)

iris.fmap$id = as.factor(iris.fmap$id) iris.fmap$name = as.factor(iris.fmap$name) iris.fmap$type = as.factor(iris.fmap$type)

iris.xgb = xgboost(data = iris.DMatrix, missing = NULL, objective = "multi:softmax", num_class = 3, nrounds = 13)

r2pmml(iris.xgb, "iris_xgb.pmml", fmap = iris.fmap, response_name = "Species", response_levels = c("setosa", "versicolor", "virginica"), missing = NULL, ntreelimit = 7, compact = TRUE)

Error:

SEVERE: Failed to convert RDS to PMML java.lang.IllegalArgumentException at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:295) at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:265) at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:230) at org.jpmml.rexp.XGBoostConverter.ensureFeatureMap(XGBoostConverter.java:209) at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:66) at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70) at org.jpmml.rexp.Converter.encodePMML(Converter.java:39) at org.jpmml.rexp.Main.run(Main.java:149) at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.IllegalArgumentException at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:295) at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:265) at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:230) at org.jpmml.rexp.XGBoostConverter.ensureFeatureMap(XGBoostConverter.java:209) at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:66) at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70) at org.jpmml.rexp.Converter.encodePMML(Converter.java:39) at org.jpmml.rexp.Main.run(Main.java:149) at org.jpmml.rexp.Main.main(Main.java:97)

vruusmann commented 3 years ago

java.lang.IllegalArgumentException at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:295)

See for yourself: https://github.com/jpmml/jpmml-r/blob/1.4.2/src/main/java/org/jpmml/rexp/XGBoostConverter.java#L295

Looks like it's not allowed to convert the id attribute to R factor; leave it to be a R integer.

iris.fmap = as.fmap(iris.matrix)

iris.fmap$name = as.factor(iris.fmap$name)
iris.fmap$type = as.factor(iris.fmap$type)
Kawalierus commented 3 years ago

And we have reached another one:

SEVERE: Failed to convert RDS to PMML java.lang.IllegalArgumentException: 1730313018.1919250021 at org.jpmml.xgboost.Learner.load(Learner.java:82) at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:93) at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:57) at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:45) at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:309) at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:242) at org.jpmml.rexp.XGBoostConverter.ensureLearner(XGBoostConverter.java:218) at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:80) at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70) at org.jpmml.rexp.Converter.encodePMML(Converter.java:39) at org.jpmml.rexp.Main.run(Main.java:149) at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.IllegalArgumentException: 1730313018.1919250021 at org.jpmml.xgboost.Learner.load(Learner.java:82) at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:93) at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:57) at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:45) at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:309) at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:242) at org.jpmml.rexp.XGBoostConverter.ensureLearner(XGBoostConverter.java:218) at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:80) at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70) at org.jpmml.rexp.Converter.encodePMML(Converter.java:39) at org.jpmml.rexp.Main.run(Main.java:149) at org.jpmml.rexp.Main.main(Main.java:97)

vruusmann commented 3 years ago

java.lang.IllegalArgumentException: 1730313018.1919250021

Duplicate of https://github.com/jpmml/jpmml-xgboost/issues/54

TLDR: XGBoost 1.3.X switched model saving data format from binary to JSON.

Two solutions:

vruusmann commented 3 years ago

@Kawalierus I've released R2PMML version 0.25.2 to GitHub, which is able to convert XGBoost 1.3(.3) models now.