Closed itzikjan closed 4 years ago
TLDR: During Scikit-Learn version upgrade from 0.21.X to 0.22.X many modules were renamed (typically, by prepending an underscore character to the module name). For example, sklearn.preprocessing.label.LabelEncoder
became sklearn.preprocessing._label.LabelEncoder
.
If you're using Scikit-Learn 0.22.X (or newer), then you need to upgrade to SkLearn2PMML version 0.51.X (or newer). For example, SkLearn2PMML version 0.51.0, which is based on JPMML-SkLearn version 1.5.25 knows both label
and _label
modules:
https://github.com/jpmml/jpmml-sklearn/blob/1.5.25/src/main/resources/META-INF/sklearn2pmml.properties#L121-L122
Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class sklearn.preprocessing._label.LabelEncoder)
Please upgrade to SkLearn2PMML version 0.51.0 (or newer).
'Invalid value treatment {0} does not support invalid_value_replacement attribute', 'as_missing'
This is a legitimate complaint. Older SkLearn2PMML versions did not check for conflicting domain attribute values, whereas newer ones do.
Please update your Python source code. Specifically, remove any domain attribute values that you're not 100% sure about.
Hi,
We are using this package for a long time at production with python 2.7 with the following code:
params2 = {'n_estimators': 100, 'learning_rate': 0.5, 'seed': 0, 'subsample': 0.8, 'n_jobs': 50, 'colsample_bytree': 0.8, 'objective': 'binary:logistic', 'max_depth': 10, 'min_child_weight': 300, 'gamma': 2, 'max_delta_step': 6 }
estimator = xgb.XGBClassifier(**params2) mapper = DataFrameMapper([(i, None) if j != 'object' and j != 'bool' else (i, [CategoricalDomain( missing_value_treatment="as_value", invalid_value_treatment="as_missing", missing_value_replacement=train_x[ i].value_counts().idxmax(), invalid_value_replacement=train_x[ i].value_counts().idxmax()), LabelEncoder()]) for i, j in zip(train_x.columns.values, train_x.dtypes.values)] , input_df=True, df_out=True)
rf_pipeline = PMMLPipeline([("mapper", mapper), ("classifier", estimator)]) rf_pipeline.fit(train_x, train_y) sklearn2pmml(rf_pipeline, pmml_model_name, with_repr=True)
ai-model-infra==0.1 awscli==1.16.300 beautifulsoup4==4.7.1 boto==2.49.0 boto3==1.10.36 botocore==1.13.36 certifi==2019.11.28 chardet==3.0.4 colorama==0.4.1 Cython==0.29.14 datadog==0.32.0 decorator==4.4.1 docutils==0.15.2 fsspec==0.6.1 idna==2.8 jmespath==0.9.3 joblib==0.14.1 lxml==4.3.0 mysqlclient==1.3.14 nltk==3.4 nose==1.3.4 numpy==1.17.4 ortools==7.4.7247 pandas==0.25.3 pandasql==0.7.3 protobuf==3.11.1 py-dateutil==2.2 pyarrow==0.13.0 pyasn1==0.4.8 python-dateutil==2.8.0 python36-sagemaker-pyspark==1.2.1 pytz==2018.9 PyYAML==3.11 requests==2.22.0 rsa==3.4.2 s3fs==0.4.0 s3transfer==0.2.1 scikit-learn==0.22 scipy==1.3.3 singledispatch==3.4.0.3 six==1.12.0 sklearn==0.0 sklearn-pandas==1.8.0 sklearn2pmml==0.51.0 soupsieve==1.6.2 SQLAlchemy==1.3.11 urllib3==1.25.7 windmill==1.6 xgboost==0.90
We are moving to python 3.6. and we are getting the following error (versions: 0.47.1 and 0.51.0)
Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class sklearn.preprocessing._label.LabelEncoder) at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43) at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:57) at sklearn.LabelEncoderClassifier.getLabelEncoder(LabelEncoderClassifier.java:40) at sklearn.LabelEncoderClassifier.getClasses(LabelEncoderClassifier.java:34) at sklearn.ClassifierUtil.getClasses(ClassifierUtil.java:32) at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:128) at org.jpmml.sklearn.Main.run(Main.java:145) at org.jpmml.sklearn.Main.main(Main.java:94) Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.preprocessing.LabelEncoder at java.lang.Class.cast(Class.java:3369) at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41) ... 7 more