iamDecode / sklearn-pmml-model

A library to parse and convert PMML models into Scikit-learn estimators.
BSD 2-Clause "Simplified" License
76 stars 15 forks source link

PMMLLogisticRegression does not work with predict_proba (support CutTransformer) #47

Open KeenCat opened 1 year ago

KeenCat commented 1 year ago

Description

Hello, Thank you for amazing library faster than pypmml! My logistic model is ('model', LogisticRegression(random_state=0, solver='liblinear'). So, I hope to use PMMLLogisticRegression for getting probability of target '1'. I think result of using predict_proba(xx) is list of probability (0~1) value. However, those result values are classification labels like [0, 1, 0, 0, 1].

Could I know how can I get probability of logistic regression?

Thank you.

Steps/Code to Reproduce

from sklearn_pmml_model.linear_model import PMMLLogisticRegression, PMMLRidgeClassifier, PMMLLinearRegression
model = PMMLLogisticRegression(pmml="./pmml/blahblah.pmml")
model.predict_proba(test)

Expected Results

[0.324234, 0.235365, 0.86786655, 0.435345, 0.3463654]

Actual Results

array([[1., 0.], [0., 1.], [0., 1.], [0., 1.], [1., 0.]])

Versions

Linux-4.19.157-1.20201118.el7.x86_64-x86_64-with-centos-7.8.2003-Core Python 3.6.15 | packaged by conda-forge | (default, Dec 3 2021, 18:49:41) [GCC 9.4.0] NumPy 1.19.4 SciPy 1.5.4 Scikit-Learn 0.23.2 sklearn-pmml-model 1.0.1

iamDecode commented 1 year ago

Ah that seems odd, thanks for reporting! Would you be able to share the code you used to generate the pmml, or the pmml file itself? This will make it easier for me to debug

KeenCat commented 1 year ago

Sorry for late reply. I made test pmml file from our data. This is test pmml file. Thank you. issue_pmml_file.txt

KeenCat commented 1 year ago

@iamDecode ! I found that the reason is not supporting CutTransformer in sklearn-pmml-model.

iamDecode commented 1 year ago

Yes it seems like it. This library currently only aims to support PMML files describing the model only. If you want to use data transformations, you can just apply the transformations on the data first, and then use the transformed data as the training or test data.

I will keep this issue open as a reminder we should consider supporting (cut) transformers in the future.