jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

Support for `mlr::WrappedModel` meta-model type #46

Closed i3Jesterhead closed 4 years ago

i3Jesterhead commented 5 years ago

Hi, I am trying to convert a xgboost classification model from the MLR library in R to a pmml-file.

library(mlr) 
learner = makeLearner("classif.xgboost", predict.type = "prob")

When trying to convert the trained model I get the following error message.

> r2pmml(XgbModel, "Outfile.pmml")
Sep 19, 2018 1:20:35 PM org.jpmml.rexp.Main run
INFORMATION: Initializing default Converter
Exception in thread "main" java.lang.IllegalArgumentException: No built-in converter for class [WrappedModel]
    at org.jpmml.rexp.ConverterFactory.newConverter(ConverterFactory.java:41)
    at org.jpmml.rexp.Main.run(Main.java:134)
    at org.jpmml.rexp.Main.main(Main.java:97)
Error in .convert(tempfile, file, converter, converter_classpath, verbose) : 
  1

Can you make any sense of the error message? The Java Version should not be the problem btw..

java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)  

Thanks in advance!

vruusmann commented 5 years ago

library(mlr) learner = makeLearner("classif.xgboost", predict.type = "prob")

You're training XGBoost models using the mlr package (an abstraction layer). Apparently, the mlr::makeLearner() function returns a mlr-specific mlr::WrappedModel object, not a generic xgb::xgb.Booster object.

The list of supported model classes is given in the README file of the JPMML-R library: https://github.com/jpmml/jpmml-r/blob/master/README.md#features

As you can see, the mlr package is not supported at the moment.

Maybe it's possible to extract the xgb.Booster object from the WrappedModel object, and pass it to the r2pmml::r2pmml() function directly. Then again, I haven't studied the internals of the mlr package yet, and could be overly optimistic here.

i3Jesterhead commented 5 years ago

Thank you for clearing that up! I will just use a generic xgb::xgb.Booster object then.

vruusmann commented 5 years ago

Reopening - I have the mlr package in my TODO list; this issue will help to increase its priority,

i3Jesterhead commented 5 years ago

I found a very convincing solution to the problem! With the function getLearnerModel(model, more.unwrap = TRUE) it's possible to extract the underlying xgb.Booster object from the Wrappedmodel. After that converting into pmml was a piece of cake

hanzigs commented 5 years ago

The disadvantage in unwrap using getLearnerModel(model, more.unwrap = TRUE)

is, if we have predict.type as probability, we wont get the good:bad probability percentages, we get only either 0 or 1

Is there any way to convert to pmml the mlr wrapped models

vruusmann commented 5 years ago

@apremgeorge You can keep your original mlr::WrappedModel object as-is. If you want to convert the enclosed model object to PMML, then you should extract it into a separate temporary variable (instead of re-assigning the original variable).

hanzigs commented 5 years ago

Thanks for the reply library(pmml) rf_mod rf_pmml <- pmml(model=rf_mod) Above Code produces an error as Error in UseMethod("pmml") : no applicable method for 'pmml' applied to an object of class "c('FilterModel', 'BaseWrapperModel', 'WrappedModel')"

So I use getLearnerModel, this works, but the object randomForest gives only response, not the truth,prob.0,prob.1,response as given by rf_mod randomForest <- getLearnerModel(rf_mod, more.unwrap = TRUE) rf_pmml <- pmml(model=randomForest)

Thanks for any help

vruusmann commented 5 years ago

library(pmml)

@apremgeorge This issue tracker is about the r2pmml package, not pmml package. Please re-submit your issue to someplace else.

hanzigs commented 5 years ago

The issue is "no applicable method for object of class "c('FilterModel', 'BaseWrapperModel', 'WrappedModel')" in mlr wrapper

hanzigs commented 5 years ago

Thank you

bmreiniger commented 4 years ago

This would indeed be nice to have. I would expect it to be straightforward in most cases, just extracting the mlr model's learner.model attribute. (Or, in mlr3, the model attribute.)

I wanted to point out one place where additional work would be needed. In mlr v2.16 or mlr3 with an xgboost binary classifier, in order to properly generate metrics for early stopping, the labels get switched before fitting the xgboost model: https://github.com/mlr-org/mlr/pull/2644 Extracting the underlying model and then using r2pmml then switches the final output probabilities.

E.g.,

library(mlr)
library(xgboost)

set.seed(314)

data("iris")
# make binary target
iris$Species <- as.integer(iris$Species)
iris$Species <- as.integer(abs(iris$Species - 2))

task <- makeClassifTask(data = iris, target = "Species")
xgb_learner <- makeLearner(
  'classif.xgboost',
  predict.type = 'prob',
  par.vals = list(
    objective = 'binary:logistic',
    eval_metric = 'auc',
    nrounds = 10
  )
)

mlr_model <- train(xgb_learner, task = task)

mlr_preds0 <- predictLearner(xgb_learner, mlr_model, iris[, names(iris) != 'Species'])
mlr_preds <- predict(mlr_model, task = task)

xgb_model <- mlr_model$learner.model
dmat <- xgb.DMatrix(data = as.matrix(iris[, names(iris) != 'Species']))
xgb_preds <- predict(xgb_model, dmat)

head(mlr_preds0)
head(mlr_preds$data)
head(xgb_preds)
# here the predictions are swapped.  That persists if you convert to pmml:
xgb_fmap <- r2pmml::genFMap(iris[, names(iris) != 'Species'])
r2pmml::r2pmml(xgb_model, fmap = xgb_fmap, './r2pmml-xgb-test')
vruusmann commented 4 years ago

I've been wondering how R people train XGBoost models (Python people have excellent Scikit-Learn wrapper classes). Seems like the mlr(3) package is rather fashionable in these days.

vruusmann commented 4 years ago

This issue was fixed in https://github.com/jpmml/jpmml-r/commit/496248c908a2e8f0c4d39f26348ca40c91bcc524

There's an updated R2PMML package version 0.24.1 available in GitHub.

@bmreiniger The MLR+XGBoost example that you shared in https://github.com/jpmml/r2pmml/issues/46#issuecomment-589829633 works now; pay attention to the invert_levels decoration:

mlr_model <- train(xgb_learner, task = task)

xgb_fmap <- r2pmml::genFMap(iris[, names(iris) != 'Species'])

r2pmml(mlr_model, "iris.pmml", fmap = xgb_fmap)
r2pmml(mlr_model, "iris-inverted.pmml", invert_levels = TRUE, fmap = xgb_fmap)

@bmreiniger The above example is about a dataset that contains only continuous features. How do you approach a mix of continuous plus categorical features in the MLR package? I'd like to expand the MLR+XGBoost integration, but it would be easier if there were some pointers about how it's normally done.

LSym2 commented 2 years ago

Hi @vruusmann,

I'm unsing the mlr-package for fitting a xgboost as described from i3Jesterhead above but when I tried to apply the solution postet here I realized that the funtion genFMap is no longer available in the package.

Is there any equivalent solution that works without this function?

Thanks very much for your help!

vruusmann commented 2 years ago

@LSym2 The function r2pmml::genFMap has been refactored into r2pmml::as.fmap (generic function; specializations exist for data.frame and matrix cases): https://github.com/jpmml/r2pmml/blob/0.26.1/R/xgboost.R

More code examples here (not related to mlr, though): https://github.com/jpmml/jpmml-xgboost/blob/1.5.6/src/test/resources/xgboost.R

LSym2 commented 2 years ago

Hi @vruusmann,

ok thanks very much, it worked now!