jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

Improve the Java installation check #52

Closed waiyujack closed 5 years ago

waiyujack commented 5 years ago

Hi, I am having an error in converting a logistic xgboost model to pmml. Could someone give me a hint of how to resolve this?

library('r2pmml')
library('xgboost')

#Data
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')

#Define dgCMatrix
train <- agaricus.train
test <- agaricus.test

#Define xgb Dmatrix
dtrain <- xgb.DMatrix(data = train$data, label = train$label)
dtest <- xgb.DMatrix(data = test$data, label = test$label)

#Train xgboost model
xgbmodel <- xgboost(data = dtrain, max.depth = 2, eta = 1, nthread = 2, nrounds = 1, objective = "binary:logistic")

#Generates an XGBoost feature map
df_train<-as.data.frame(as.matrix(train$data))
xgbmodel.fmap = genFMap(df_train)

#Export as pmml
r2pmml(xgbmodel, "bstDMatrix.pmml", fmap = xgbmodel.fmap)

Once I run the above code, an error occurs.

Error in .convert(tempfile, file, converter, converter_classpath, verbose) : 
  127

For the versions of my packages, I am using r2pmml 0.20.0 xgboost 0.71.2 R 3.5.1

Many thanks!

vruusmann commented 5 years ago

Are you using Windows? If so, then Error 127 means that there's a required application missing in your system (scroll down to "ERROR_PROC_NOT_FOUND" error code): https://docs.microsoft.com/en-us/windows/desktop/debug/system-error-codes--0-499-

The r2pmml package depends on Java executable (java.exe), which must be available on system path. If you open command prompt and type "java -version", what do you see?

vruusmann commented 5 years ago

The r2pmml package should draw inspiration from the sklearn2pmml package: https://github.com/jpmml/sklearn2pmml/blob/master/sklearn2pmml/__init__.py#L230-L233

There are no Error 127 issues reported against that package, because the error message is good/intuitive enough.

waiyujack commented 5 years ago

Thanks vruusmann! I guess my Java was corrupted and it worked after I re-installed it.

waiyujack commented 5 years ago

I have a follow up question which is related. If i use the xgboost with objective = "multi:softprob",num_class=2 . i.e.

#Train xgboost model
xgbmodel <- xgboost(data = dtrain, max.depth = 2, eta = 1, nthread = 2, nrounds = 2, objective = "multi:softprob",num_class=2)

#Generates an XGBoost feature map
df_train<-as.data.frame(as.matrix(train$data))
xgbmodel.fmap = genFMap(df_train)

#Export as pmml
r2pmml(xgbmodel, "bstDMatrix.pmml", fmap = xgbmodel.fmap)

I have come across another error. I know it seems a bit silly as this is equivalent to logistic family but I wanted to deploy the model to another software which only supports multi:softprob'. I tired to set num_class = 3 and it worked again. You know the reason why it does not support 2 classes?

The error is like this:

Jan 27, 2019 8:16:17 PM org.jpmml.rexp.Main run
INFO: Parsing RDS..
Jan 27, 2019 8:16:17 PM org.jpmml.rexp.Main run
INFO: Parsed RDS in 21 ms.
Jan 27, 2019 8:16:17 PM org.jpmml.rexp.Main run
INFO: Initializing default Converter
Jan 27, 2019 8:16:17 PM org.jpmml.rexp.Main run
INFO: Initialized org.jpmml.rexp.XGBoostConverter
Jan 27, 2019 8:16:17 PM org.jpmml.rexp.Main run
INFO: Converting..
Jan 27, 2019 8:16:17 PM org.jpmml.rexp.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Multi-class classification requires three or more target categories
    at org.jpmml.xgboost.MultinomialLogisticRegression.<init>(MultinomialLogisticRegression.java:42)
    at org.jpmml.xgboost.Learner.load(Learner.java:88)
    at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:53)
    at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:45)
    at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:201)
    at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:143)
    at org.jpmml.rexp.XGBoostConverter.ensureLearner(XGBoostConverter.java:131)
    at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:78)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:69)
    at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
    at org.jpmml.rexp.Main.run(Main.java:149)
    at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.IllegalArgumentException: Multi-class classification requires three or more target categories
    at org.jpmml.xgboost.MultinomialLogisticRegression.<init>(MultinomialLogisticRegression.java:42)
    at org.jpmml.xgboost.Learner.load(Learner.java:88)
    at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:53)
    at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:45)
    at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:201)
    at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:143)
    at org.jpmml.rexp.XGBoostConverter.ensureLearner(XGBoostConverter.java:131)
    at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:78)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:69)
    at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
    at org.jpmml.rexp.Main.run(Main.java:149)
    at org.jpmml.rexp.Main.main(Main.java:97)
Error in .convert(tempfile, file, converter, converter_classpath, verbose) : 
  1
vruusmann commented 5 years ago

You know the reason why it does not support 2 classes?

That could be a completely arbitrary restriction imposed by the JPMML-XGBoost library. Or perhaps older XGBoost versions didn't support this parameter combination (objective = "multi:softprob", num_class=2), but newer ones (such as your 0.7.1) already do.

Internally, it's about how many "ensembles" of decision trees the XGBoost binary file contains. With objective = "binary:logistic" there's definitely just one "ensemble", whereas with objective = "multi:softprob", num_class=3 there are definitely three "ensembles" (one for each class).

Somebody needs to check if the XGBoost binary file that corresponds to objective = "multi:softprob", num_class=2 contains one or two "ensembles" of decision trees.

From the PMML conversion perspective, there's nothing too special about this parameter combination - it's been simply overlooked so far.

waiyujack commented 5 years ago

Thanks a lot vruusmann! I hope it won't be too hard for a fix.

vruusmann commented 5 years ago

@waiyujack I've released JPMML-XGBoost version 1.3.5, which adds support for the objective="multi:softprob" num_class=2 parameter combination.

The library JAR file has been pushed to the Maven Central repository, and should show up for the general public in a couple of hours time (click "Download -> jar"): http://search.maven.org/classic/#search%7Cga%7C1%7Cjpmml-xgboost

It will take an unspecified amount of time before I update the official R2PMML package with it. However, if you want to start using this functionality ASAP, then go to the /inst/java subdirectory of your R2PMML installation, and replace jpmml-xgboost-1.3.4.jar with this new jpmml-xgboost-1.3.5.jar.