jpmml / sklearn2pmml

Python library for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
685 stars 113 forks source link

java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class sklearn.preprocessing._label.LabelEncoder) #197

Closed itzikjan closed 4 years ago

itzikjan commented 4 years ago

Hi,

We are using this package for a long time at production with python 2.7 with the following code:

params2 = {'n_estimators': 100, 'learning_rate': 0.5, 'seed': 0, 'subsample': 0.8, 'n_jobs': 50, 'colsample_bytree': 0.8, 'objective': 'binary:logistic', 'max_depth': 10, 'min_child_weight': 300, 'gamma': 2, 'max_delta_step': 6 }

estimator = xgb.XGBClassifier(**params2) mapper = DataFrameMapper([(i, None) if j != 'object' and j != 'bool' else (i, [CategoricalDomain( missing_value_treatment="as_value", invalid_value_treatment="as_missing", missing_value_replacement=train_x[ i].value_counts().idxmax(), invalid_value_replacement=train_x[ i].value_counts().idxmax()), LabelEncoder()]) for i, j in zip(train_x.columns.values, train_x.dtypes.values)] , input_df=True, df_out=True)

rf_pipeline = PMMLPipeline([("mapper", mapper), ("classifier", estimator)]) rf_pipeline.fit(train_x, train_y) sklearn2pmml(rf_pipeline, pmml_model_name, with_repr=True)

pip3 freeze

ai-model-infra==0.1 awscli==1.16.300 beautifulsoup4==4.7.1 boto==2.49.0 boto3==1.10.36 botocore==1.13.36 certifi==2019.11.28 chardet==3.0.4 colorama==0.4.1 Cython==0.29.14 datadog==0.32.0 decorator==4.4.1 docutils==0.15.2 fsspec==0.6.1 idna==2.8 jmespath==0.9.3 joblib==0.14.1 lxml==4.3.0 mysqlclient==1.3.14 nltk==3.4 nose==1.3.4 numpy==1.17.4 ortools==7.4.7247 pandas==0.25.3 pandasql==0.7.3 protobuf==3.11.1 py-dateutil==2.2 pyarrow==0.13.0 pyasn1==0.4.8 python-dateutil==2.8.0 python36-sagemaker-pyspark==1.2.1 pytz==2018.9 PyYAML==3.11 requests==2.22.0 rsa==3.4.2 s3fs==0.4.0 s3transfer==0.2.1 scikit-learn==0.22 scipy==1.3.3 singledispatch==3.4.0.3 six==1.12.0 sklearn==0.0 sklearn-pandas==1.8.0 sklearn2pmml==0.51.0 soupsieve==1.6.2 SQLAlchemy==1.3.11 urllib3==1.25.7 windmill==1.6 xgboost==0.90

We are moving to python 3.6. and we are getting the following error (versions: 0.47.1 and 0.51.0)

  1. Standard output is empty Standard error: Dec 11, 2019 1:17:18 PM org.jpmml.sklearn.Main run INFO: Parsing PKL.. Dec 11, 2019 1:17:18 PM org.jpmml.sklearn.Main run INFO: Parsed PKL in 132 ms. Dec 11, 2019 1:17:18 PM org.jpmml.sklearn.Main run INFO: Converting.. Dec 11, 2019 1:17:18 PM org.jpmml.sklearn.Main run SEVERE: Failed to convert java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class sklearn.preprocessing._label.LabelEncoder) at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43) at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:57) at sklearn.LabelEncoderClassifier.getLabelEncoder(LabelEncoderClassifier.java:40) at sklearn.LabelEncoderClassifier.getClasses(LabelEncoderClassifier.java:34) at sklearn.ClassifierUtil.getClasses(ClassifierUtil.java:32) at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:128) at org.jpmml.sklearn.Main.run(Main.java:145) at org.jpmml.sklearn.Main.main(Main.java:94) Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.preprocessing.LabelEncoder at java.lang.Class.cast(Class.java:3369) at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41) ... 7 more

Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class sklearn.preprocessing._label.LabelEncoder) at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43) at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:57) at sklearn.LabelEncoderClassifier.getLabelEncoder(LabelEncoderClassifier.java:40) at sklearn.LabelEncoderClassifier.getClasses(LabelEncoderClassifier.java:34) at sklearn.ClassifierUtil.getClasses(ClassifierUtil.java:32) at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:128) at org.jpmml.sklearn.Main.run(Main.java:145) at org.jpmml.sklearn.Main.main(Main.java:94) Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.preprocessing.LabelEncoder at java.lang.Class.cast(Class.java:3369) at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41) ... 7 more

  1. In other versions we also got the following error: ('Invalid value treatment {0} does not support invalid_value_replacement attribute', 'as_missing')
vruusmann commented 4 years ago

TLDR: During Scikit-Learn version upgrade from 0.21.X to 0.22.X many modules were renamed (typically, by prepending an underscore character to the module name). For example, sklearn.preprocessing.label.LabelEncoder became sklearn.preprocessing._label.LabelEncoder.

If you're using Scikit-Learn 0.22.X (or newer), then you need to upgrade to SkLearn2PMML version 0.51.X (or newer). For example, SkLearn2PMML version 0.51.0, which is based on JPMML-SkLearn version 1.5.25 knows both label and _label modules: https://github.com/jpmml/jpmml-sklearn/blob/1.5.25/src/main/resources/META-INF/sklearn2pmml.properties#L121-L122

Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class sklearn.preprocessing._label.LabelEncoder)

Please upgrade to SkLearn2PMML version 0.51.0 (or newer).

'Invalid value treatment {0} does not support invalid_value_replacement attribute', 'as_missing'

This is a legitimate complaint. Older SkLearn2PMML versions did not check for conflicting domain attribute values, whereas newer ones do.

Please update your Python source code. Specifically, remove any domain attribute values that you're not 100% sure about.