Closed liamjoy closed 4 years ago
In plain english, what is this model chain supposed to do? What is the function of LightGBMClassifier
, what is the function of IsotonicRegression
?
Are you trying to "smooth" the prediction of the classifier?
I have looked into using StackedClassifier/Regressor but because LGBM is a classifier and Isotonic is a regressor, it does not allow it.
I assume you're referring to Scikit-Learn's stacking estimator classes, and that it is Scikit-Learn that prevents you from building such a model chain (not the SkLearn2PMML/JPMML-SkLearn stack).
I am also unable to use two models in a single pipeline as only one estimator is allowed
Possible workaround - the first estimator should be packaged as a transformer: https://github.com/jpmml/sklearn2pmml/issues/118
LGBMClassifier takes in around 100 features to predict a binary target class. The isotonic regression is used to calibrate the model predictions to match a different distribution. The output should be a probability of the target being 1, after prediction calibration.
Thank you, I will look into packaging the LGBMClassifier as a transformer.
The isotonic regression is used to calibrate the model predictions to match a different distribution.
This looks like a "decision engineering" problem - taking the prediction of a model, and then doing something extra with it.
In such a case LGBMClassifier
is still the primary/final estimator of the pipeline, and the challenge is about applying IsotonicRegression
to its predicted probability.
Decision engineering is not supported by Scikit-Learn pipelines. However, the sklearn2pmml.pipeline.PMMLPipeline
class lets you specify three attributes predict_transformer
, predict_proba_transformer
and apply_transformer
to accomplish it: https://github.com/jpmml/sklearn2pmml/blob/0.61.0/sklearn2pmml/pipeline/__init__.py#L47-L51
Suppose you want to manually correct the predicted probability of a binary classifier:
pipeline = PMMLPipeline(.., predict_proba_transformer = ExpressionTransformer("X[1] * 0.95 + 0.1"))
You should package IsotonicRegression
as a transformer instead.
Hi Villu!
Could you suggest a way to package IsotonicRegression as a transformer, please? I tried ModelTransformer from #https://github.com/jpmml/sklearn2pmml/issues/118, but bumped into
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.predict_proba_transformer' has an unsupported value (Python class __main__.ModelTransformer)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:57)
at org.jpmml.sklearn.PyClassDict.getOptional(PyClassDict.java:67)
at sklearn2pmml.pipeline.PMMLPipeline.getTransformer(PMMLPipeline.java:441)
at sklearn2pmml.pipeline.PMMLPipeline.getPredictProbaTransformer(PMMLPipeline.java:433)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:101)
at org.jpmml.sklearn.Main.run(Main.java:145)
at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer
at java.lang.Class.cast(Unknown Source)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)
... 7 more
Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.predict_proba_transformer' has an unsupported value (Python class __main__.ModelTransformer)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:57)
at org.jpmml.sklearn.PyClassDict.getOptional(PyClassDict.java:67)
at sklearn2pmml.pipeline.PMMLPipeline.getTransformer(PMMLPipeline.java:441)
at sklearn2pmml.pipeline.PMMLPipeline.getPredictProbaTransformer(PMMLPipeline.java:433)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:101)
at org.jpmml.sklearn.Main.run(Main.java:145)
at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer
at java.lang.Class.cast(Unknown Source)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)
... 7 more
And I'm also not sure how to represent isotonic regression as an expression for ExpressionTransformer. The only idea which came to my mind is iteratively building a string with "if else" clauses with implementing extrapolation between values x in scipy.interpolate.interpolate.interp1d
, which is the base of IsotonicRegression in sklearn. But it doesn't seem like a good solution to me.
Are there any other options to wrap IsotonicRegression in a transformer? Or maybe is there a better solution with ExpressionTransformer?
java.lang.IllegalArgumentException: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.predict_proba_transformer' has an unsupported value (Python class main.ModelTransformer)
Looks like you're trying to develop a custom transformer. You've implemented the Python side, but you still haven't implemented the Java side, plus informing the SkLearn2PMML package about it all.
Lately it's been discussed here: https://github.com/jpmml/sklearn2pmml/issues/283
And I'm also not sure how to represent isotonic regression as an expression for ExpressionTransformer.
See the EstimatorTransformer
class from the Scikit-Lego package (I decided to reuse an existing 3rd party class instead of coming up with my own).
Something like this:
from sklego.meta import EstimatorTransformer
# A pre-fitted Isotonic regression
isotonicRegression = ..
pipeline = PMMLPipeline(.., predict_proba_transformer = EstimatorTransformer(isotonicRegression))
Thanks for the quick response!
I found EstimatorTransformer in the supported packages list #https://github.com/jpmml/jpmml-sklearn and tried to use it but found two issues:
Attribute 'sklearn2pmml.pipeline.PMMLPipeline.predict_proba_transformer' has an unsupported value (Python class sklego.meta.estimator_transformer.EstimatorTransformer)
, probably it can be an issue with library version (I use sklearn2pmml==0.49.3)ValueError: Isotonic regression input should be a 1d array
, since the output of the model predict_proba is 2d array, but isotonic expects 1d array. Is there a workaround with using EstimatorTransformer? Or the only way is building a custom transformer?
it can be an issue with library version (I use sklearn2pmml==0.49.3)
Exactly - support for the sklego.meta.EstimatorTransformer
transformation type was added in SkLearn2PMML version 0.73.0 (released ~3 days ago).
Can't use predict_proba_transform, since the output of the model predict_proba is 2d array, but isotonic expects 1d array.
Use a helper transformer to select a single column (eg. probability of class Z) out of the available ones:
pipeline = PMMLPipeline(..,
predict_proba_transformer = Pipeline([
("select_col", ExpressionTransformer("X[1]")),
("transform_col", IsotonicRegression())
])
)
Is there a workaround with using EstimatorTransformer? Or the only way is building a custom transformer?
Honestly, just upgrade the SkLearn2PMML package to the latest version.
Hi, Villu!
I updated sklearn2pmml library and found a new issue while building PMMLPipeline Code:
model = XGBClassifier( ... )
model.fit(x, y)
pipeline = PMMLPipeline([('classifier', model)])
Error:
53 self.apply_transformer = apply_transformer
54 # SkLearn 0.24+
---> 55 super(PMMLPipeline, self).__init__(steps = steps, memory = memory, verbose = verbose)
56
57 def __repr__(self):
TypeError: __init__() got an unexpected keyword argument 'verbose'
sklearn2pmml version:
0.73.0
0.60.0 and older work well, but EstimatorTransformer isn't supported there.
TypeError: init() got an unexpected keyword argument 'verbose'
The sklearn.pipeline.Pipeline
constructor introduced the verbose
parameter in Scikit-Learn 0.21.0:
https://scikit-learn.org/0.21/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline
Why would anyone use a pre-0.21 version in June 2021?
It works with updated libraries! Thank you for your suggestions, it helped a lot!
I have been able to do the above by creating separate PMMLs for both LGBMClassifier and IsotonicRegressor, then copying the IsotonicRegressor PMML into the LGBM PMML as a final
Segment
in the chained model. I have looked into using StackedClassifier/Regressor but because LGBM is a classifier and Isotonic is a regressor, it does not allow it. I am also unable to use two models in a single pipeline as only one estimator is allowed. Is it possible to do this using a single pipeline or some other work-around?