jpmml / sklearn2pmml

Python library for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
684 stars 113 forks source link

Fail to create pmml when `expr` of `ExpressionTransformer` is a function #365

Closed hjh1011 closed 1 year ago

hjh1011 commented 1 year ago

I am trying to process the output of a multi-classification model of xgb.XGBClassifier with a customized function.

def pred_LB_conf(X):
    remain_prob = 1.0
    for i in range(len(X)):
        remain_prob -= X[i]
        if remain_prob < 0.65:
            return i

PMML_pp = PMMLPipeline(steps=steps, 
                       predict_proba_transformer = Alias(ExpressionTransformer(pred_LB_conf), name='pred_LB', prefit=True))

But when I try to write it into a xml file with

sklearn2pmml(PMML_pp, "xgb_multiClass_test.xml", with_repr=True)

It gives me error and says expr is not supported? I am wondering what should I look into now

Dec 15, 2022 12:02:46 AM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Attribute 'sklearn2pmml.preprocessing.ExpressionTransformer.expr' has an unsupported value (Java class net.razorvine.pickle.objects.ClassDictConstructor)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:50)
    at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:79)
    at org.jpmml.sklearn.PyClassDict.getString(PyClassDict.java:139)
    at sklearn2pmml.preprocessing.ExpressionTransformer.getExpr(ExpressionTransformer.java:90)
    at sklearn2pmml.preprocessing.ExpressionTransformer.encodeFeatures(ExpressionTransformer.java:45)
    at sklearn2pmml.decoration.Alias.encodeFeatures(Alias.java:56)
    at sklearn2pmml.pipeline.PMMLPipeline.encodeOutput(PMMLPipeline.java:416)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:270)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDictConstructor to java.lang.String
    at java.lang.Class.cast(Class.java:3369)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:48)
    ... 9 more
vruusmann commented 1 year ago

Passing a Python function to ExpressionTransformer is supported since SkLearn2PMML 0.88.1.

Example here: https://github.com/jpmml/sklearn2pmml/blob/0.88.1/sklearn2pmml/preprocessing/tests/__init__.py#L190-L193 https://github.com/jpmml/sklearn2pmml/blob/0.88.1/sklearn2pmml/preprocessing/tests/__init__.py#L257-L264

It doesn't work for you, because you're using an outdated SkLearn2PMML package version.

My rant: "FFS people, how hard can it be to UPGRADE YOUR SKLEARN2PMML PACKAGE VERSION before opening any issues at GitHub"?

def pred_LB_conf(X):

However, you would need to rewrite your function definition, because the Python-to-PMML translator component does not support looping constructs (eg. for statements).

hjh1011 commented 1 year ago

Thank you for the response. Is there an easy way to quickly validate whether a function is allowed for the python-to-pmml translator? Or is there any guidance to follow like what construct is allowed and what is not

vruusmann commented 1 year ago

Is there an easy way to quickly validate whether a function is allowed for the python-to-pmml translator?

Write a Python expression. If the conversion fails, then take a look which source code file/which line number is involved in raising the exception. Locate this piece of code in the JPMML-Python project, and see what's implemented around there.

vruusmann commented 1 year ago

Is there an easy way to quickly validate whether a function is allowed for the python-to-pmml translator?

JavaCC grammar for building the Python expression translator is here (linking to the latest 1.1.9 version): https://github.com/jpmml/jpmml-python/blob/1.1.9/pmml-python/src/main/javacc/expression.jj

Now, since looping constructs are not allowed, you're pretty much limited to three statements:

vruusmann commented 1 year ago

@hjh1011 Just thinking out loud - it seems to me that you're trying to emulate an ordinal target value with your pred_LB_conf function.

What's the approximate number of target categories there (ie. len(X))? If it's fairly low, then it may be feasible to craft an if-(elif)*-else statement outright (possibly generating this piece of Python code programmatically). If the number of target categories is higher, then it will be better to use PMML built-in "cumulative link" pieces of functionality.

For example, the RegressionModel element comes with pretty comprehensive support for ordinal targets (jump to "Valid combinations" sub-section): https://dmg.org/pmml/v4-4-1/Regression.html

So, your use case would be addressed by building a two-estimator chain, where the initial XGBoost estimator step computes probabilities, and the final regression step then converts this probability distribution to an ordinal category value.

vruusmann commented 1 year ago

your use case would be addressed by building a two-estimator chain, where the initial XGBoost estimator step computes probabilities, and the final regression step then converts this probability distribution to an ordinal category value.

If you open an XGBoost PMML file in text editor, then it's possible to see that there's already a RegressionModel element contained in there...

Now, to emulate an ordinal target, you could simply tweak the configuration of the existing RegressionModel element, no need to define another one.