jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

Is there a way to output a custom score instead of the default probability #179

Closed liuhuanshuo closed 1 year ago

liuhuanshuo commented 1 year ago

I can now use sklearn2pmml to generate pmml files to work with.

But I have a new question, is there a way to make the pmml file output a fraction instead of a probability of 0 or 1?

So let me be clear, I'm going to use the pmml file, and I'm going to do the following code, and it's going to come up with a probability

evaluator.evaluateAll(x_oot_1)

y   probability(0)  probability(1)
0 0 0.755183 0.244817
1 0, 0.702629 0.297371
2 0 0.628108 0.371892
3 0, 0.868231 to 0.131769
4 0 0.875624 0.124376

Is there any way to change the probability(0) and probability(1) to fractions with custom rules, such as 100 *probability(0) + 500*probability(1)?

I wonder if this should add a step after the clf steof pipeline? Whether sklearn2pmml supports such functionality

pipeline_test = PMMLPipeline(
    steps=[("mapper", mapper),
           ("classifier", clf_1)])
vruusmann commented 1 year ago

Whether sklearn2pmml supports such functionality

The Scikit-Learn framework doesn't support the idea of "post-process the prediction of the final estimator step".

However, the SkLearn2PMML package allows you to do so, as explained here: https://openscoring.io/blog/2022/05/06/sklearn_prediction_postprocessing/

Is there any way to change the probability(0) and probability(1) to fractions with custom rules, such as 100 * probability(0) + 500 * probability(1)?

# X is the predicted probabilities matrix, as returned by `pipeline.predict_proba(X)`
# The first column is probability(0), the second column is probability(1) 
custom_score_transformer = ExpressionTransformer("(100 * X[0]) + (500 * X[1])")

pipeline = PMMLPipeline(..., predict_proba_transformer = custom_score_transformer)
liuhuanshuo commented 1 year ago

I tried to do what you said

custom_score_transformer = ExpressionTransformer("(100 * X[0]) + (500 * X[1])")

mapper = DataFrameMapper(mapper_encode, input_df=True)

pipeline_final = PMMLPipeline(
steps=[("mapper", mapper),
("classifier", clf)],predict_proba_transformer = custom_score_transformer)

pipeline_final.predict_proba_transformer(x_oot_1)

However, it is a pity that the error was reported at the pipeline stage

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-781-8b73dad55740> in <module>
----> 1 pipeline_final.predict_proba_transformer(x_oot_1)

TypeError: 'NoneType' object is not callable

If I use pipeline_final.predict_proba_transform(x_oot_1),It will get the same result as If I use pipeline_final.predict_proba(x_oot_1)

>>> pipeline_final.predict_proba_transform(x_oot_1)
pipeline_final.predict_proba_transform(x_oot_1)
1
pipeline_final.predict_proba_transform(x_oot_1)
array([[0.75036584, 0.24963416],
       [0.6775218 , 0.3224782 ],
       [0.64144063, 0.35855937],
       ...,
       [0.86458361, 0.13541639],
       [0.92818167, 0.07181833],
       [0.96084351, 0.03915649]])

>>> pipeline_final.predict_proba(x_oot_1)
array([[0.75036584, 0.24963416],
       [0.6775218 , 0.3224782 ],
       [0.64144063, 0.35855937],
       ...,
       [0.86458361, 0.13541639],
       [0.92818167, 0.07181833],
       [0.96084351, 0.03915649]])

Doesn't seem to be working?

vruusmann commented 1 year ago

pipeline_final.predict_proba_transformer(x_oot_1)

TypeError: 'NoneType' object is not callable

I have no idea how the PMMLPipeline.predict_proba_transformer attribute can be None in this location. Did you re-assign the pipeline_final object in some other line of code?

The value of this attribute should be ExpressionTransformer there. This is a TransformerMixin instance, which is not callable.

pipeline_final.predict_proba_transform(x_oot_1)

WTF is x_out_1. Is it already some pipeline prediction?

You are supposed to invoke PMMLPipeline.predict_proba_transform(...) method with the input data matrix X. The PMMLPipeline will do data transformations, and probability extraction automatically in that case: https://github.com/jpmml/sklearn2pmml/blob/0.87.0/sklearn2pmml/pipeline/__init__.py#L109-L114

If I use pipeline_final.predict_proba_transform(x_oot_1),It will get the same result as If I use pipeline_final.predict_proba(x_oot_1)

This can only happen if the PMMLPipeline.predict_proba_transformer attribute is not initialized (ie. is None).

Doesn't seem to be working?

Your code is broken, not mine.