jpmml / sklearn2pmml

Python library for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
685 stars 113 forks source link

Support for the `OneHotEncoder.infrequent_categories_` attribute #386

Closed BillyBonaros closed 1 year ago

BillyBonaros commented 1 year ago

Hello,

First of all, I would like to express my appreciation for the great job done with sklearn2pmml. It has been an incredibly helpful library.

I encountered an issue while trying to save a PMML file when using the min_frequency option in OneHotEncoder. The problem arises when attempting to save the encoded data to a PMML file format.

The error: RuntimeError: The SkLearn2PMML application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams

I am using OneHotEncoder in a dataframe mapper.


mapper_ohe = DataFrameMapper([
    (['geo_country'], [SimpleImputer(strategy='most_frequent'),OneHotEncoder(handle_unknown='ignore',drop='if_binary')]),
    (['geo_region'],  [SimpleImputer(strategy='most_frequent'),OneHotEncoder(handle_unknown='ignore',drop='if_binary',min_frequency=50)])
    ], df_out=True)

clf = LogisticRegression(random_state=5)
sk_pipe = PMMLPipeline([("ohe", mapper_ohe), ("model", clf)])

sk_pipe.fit(train_balanced, train_balanced.variant)

#this is when the error occurs
sklearn2pmml(sk_pipe, "example_model_copy.pmml", with_repr = True)
vruusmann commented 1 year ago

This attribute was added in Scikit-Learn version 1.1.

Sometimes one just don't have time to stay up-to-date with everything.