Is it possible to combine LGBMClassifier and IsotonicRegressor into a single PMML?

liamjoy commented 4 years ago

I have been able to do the above by creating separate PMMLs for both LGBMClassifier and IsotonicRegressor, then copying the IsotonicRegressor PMML into the LGBM PMML as a final Segment in the chained model. I have looked into using StackedClassifier/Regressor but because LGBM is a classifier and Isotonic is a regressor, it does not allow it. I am also unable to use two models in a single pipeline as only one estimator is allowed. Is it possible to do this using a single pipeline or some other work-around?

vruusmann commented 4 years ago

In plain english, what is this model chain supposed to do? What is the function of LightGBMClassifier, what is the function of IsotonicRegression?

Are you trying to "smooth" the prediction of the classifier?

I have looked into using StackedClassifier/Regressor but because LGBM is a classifier and Isotonic is a regressor, it does not allow it.

I assume you're referring to Scikit-Learn's stacking estimator classes, and that it is Scikit-Learn that prevents you from building such a model chain (not the SkLearn2PMML/JPMML-SkLearn stack).

I am also unable to use two models in a single pipeline as only one estimator is allowed

Possible workaround - the first estimator should be packaged as a transformer: https://github.com/jpmml/sklearn2pmml/issues/118

liamjoy commented 4 years ago

LGBMClassifier takes in around 100 features to predict a binary target class. The isotonic regression is used to calibrate the model predictions to match a different distribution. The output should be a probability of the target being 1, after prediction calibration.

Thank you, I will look into packaging the LGBMClassifier as a transformer.

vruusmann commented 4 years ago

The isotonic regression is used to calibrate the model predictions to match a different distribution.

This looks like a "decision engineering" problem - taking the prediction of a model, and then doing something extra with it.

In such a case LGBMClassifier is still the primary/final estimator of the pipeline, and the challenge is about applying IsotonicRegression to its predicted probability.

Decision engineering is not supported by Scikit-Learn pipelines. However, the sklearn2pmml.pipeline.PMMLPipeline class lets you specify three attributes predict_transformer, predict_proba_transformer and apply_transformer to accomplish it: https://github.com/jpmml/sklearn2pmml/blob/0.61.0/sklearn2pmml/pipeline/__init__.py#L47-L51

Suppose you want to manually correct the predicted probability of a binary classifier:

pipeline = PMMLPipeline(.., predict_proba_transformer = ExpressionTransformer("X[1] * 0.95 + 0.1"))

You should package IsotonicRegression as a transformer instead.

sidelmary commented 3 years ago

Hi Villu!

Could you suggest a way to package IsotonicRegression as a transformer, please? I tried ModelTransformer from #https://github.com/jpmml/sklearn2pmml/issues/118, but bumped into

SEVERE: Failed to convert
java.lang.IllegalArgumentException: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.predict_proba_transformer' has an unsupported value (Python class __main__.ModelTransformer)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
    at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:57)
    at org.jpmml.sklearn.PyClassDict.getOptional(PyClassDict.java:67)
    at sklearn2pmml.pipeline.PMMLPipeline.getTransformer(PMMLPipeline.java:441)
    at sklearn2pmml.pipeline.PMMLPipeline.getPredictProbaTransformer(PMMLPipeline.java:433)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:101)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer
    at java.lang.Class.cast(Unknown Source)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)
    ... 7 more

Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.predict_proba_transformer' has an unsupported value (Python class __main__.ModelTransformer)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
    at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:57)
    at org.jpmml.sklearn.PyClassDict.getOptional(PyClassDict.java:67)
    at sklearn2pmml.pipeline.PMMLPipeline.getTransformer(PMMLPipeline.java:441)
    at sklearn2pmml.pipeline.PMMLPipeline.getPredictProbaTransformer(PMMLPipeline.java:433)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:101)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer
    at java.lang.Class.cast(Unknown Source)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)
    ... 7 more

And I'm also not sure how to represent isotonic regression as an expression for ExpressionTransformer. The only idea which came to my mind is iteratively building a string with "if else" clauses with implementing extrapolation between values x in scipy.interpolate.interpolate.interp1d, which is the base of IsotonicRegression in sklearn. But it doesn't seem like a good solution to me.

Are there any other options to wrap IsotonicRegression in a transformer? Or maybe is there a better solution with ExpressionTransformer?

vruusmann commented 3 years ago

java.lang.IllegalArgumentException: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.predict_proba_transformer' has an unsupported value (Python class main.ModelTransformer)

Looks like you're trying to develop a custom transformer. You've implemented the Python side, but you still haven't implemented the Java side, plus informing the SkLearn2PMML package about it all.

Lately it's been discussed here: https://github.com/jpmml/sklearn2pmml/issues/283

And I'm also not sure how to represent isotonic regression as an expression for ExpressionTransformer.

See the EstimatorTransformer class from the Scikit-Lego package (I decided to reuse an existing 3rd party class instead of coming up with my own).

Something like this:

from sklego.meta import EstimatorTransformer

# A pre-fitted Isotonic regression
isotonicRegression = ..

pipeline = PMMLPipeline(.., predict_proba_transformer = EstimatorTransformer(isotonicRegression))

sidelmary commented 3 years ago

Thanks for the quick response!

I found EstimatorTransformer in the supported packages list #https://github.com/jpmml/jpmml-sklearn and tried to use it but found two issues:

Can't dump the pipeline with it to PMML, having the same issue: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.predict_proba_transformer' has an unsupported value (Python class sklego.meta.estimator_transformer.EstimatorTransformer), probably it can be an issue with library version (I use sklearn2pmml==0.49.3)
Can't use predict_proba_transform, got ValueError: Isotonic regression input should be a 1d array, since the output of the model predict_proba is 2d array, but isotonic expects 1d array.

Is there a workaround with using EstimatorTransformer? Or the only way is building a custom transformer?

vruusmann commented 3 years ago

it can be an issue with library version (I use sklearn2pmml==0.49.3)

Exactly - support for the sklego.meta.EstimatorTransformer transformation type was added in SkLearn2PMML version 0.73.0 (released ~3 days ago).

Can't use predict_proba_transform, since the output of the model predict_proba is 2d array, but isotonic expects 1d array.

Use a helper transformer to select a single column (eg. probability of class Z) out of the available ones:

pipeline = PMMLPipeline(..,
  predict_proba_transformer = Pipeline([
    ("select_col", ExpressionTransformer("X[1]")),
    ("transform_col", IsotonicRegression())
  ])
)

Is there a workaround with using EstimatorTransformer? Or the only way is building a custom transformer?

Honestly, just upgrade the SkLearn2PMML package to the latest version.

sidelmary commented 3 years ago

Hi, Villu!

I updated sklearn2pmml library and found a new issue while building PMMLPipeline Code:

model = XGBClassifier( ... )
model.fit(x, y)
pipeline = PMMLPipeline([('classifier', model)])

Error:

   53                 self.apply_transformer = apply_transformer
     54                 # SkLearn 0.24+
---> 55                 super(PMMLPipeline, self).__init__(steps = steps, memory = memory, verbose = verbose)
     56 
     57         def __repr__(self):

TypeError: __init__() got an unexpected keyword argument 'verbose'

sklearn2pmml version: 0.73.0

0.60.0 and older work well, but EstimatorTransformer isn't supported there.

vruusmann commented 3 years ago

TypeError: init() got an unexpected keyword argument 'verbose'

The sklearn.pipeline.Pipeline constructor introduced the verbose parameter in Scikit-Learn 0.21.0: https://scikit-learn.org/0.21/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline

Why would anyone use a pre-0.21 version in June 2021?

sidelmary commented 3 years ago

It works with updated libraries! Thank you for your suggestions, it helped a lot!

jpmml / jpmml-sklearn

Is it possible to combine LGBMClassifier and IsotonicRegressor into a single PMML? #146