jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class xgboost.compat.XGBoostLabelEncoder) #129

Closed mags3003 closed 4 years ago

mags3003 commented 4 years ago

I want to save my XGBoost model as pmml using sklearn2pmml. I'm using Python V3.7.3 with Sklearn 0.20.3, sklearn2pmml V0.53.0 & XGBoost V1.0.0. My data is mainly binary, with just 3 columns of continuous data, I'm running my notebook in Databricks and convert my Spark dataframe to a pandas dataframe. Code snippet below

import xgboost as xgb

from sklearn_pandas import DataFrameMapper
from sklearn.compose import ColumnTransformer

from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn2pmml.decoration import ContinuousDomain
from sklearn.preprocessing import StandardScaler

X = pdf[continuous_features + numericCols]
y = pdf["Label"]

mapper = DataFrameMapper(
  [([cont_column], [ContinuousDomain(), StandardScaler()]) for cont_column in continuous_features] +
  [([c for c in numericCols], None)] # no transformation
)

clf = xgb.XGBClassifier(objective='multi:softprob',eval_metric='auc',num_class = 2,
                        n_jobs =6,max_delta_step=1, min_child_weight=14, gamma=1.5, subsample = 0.8,
                        colsample_bytree = 0.5, max_depth=10, learning_rate = 0.1)

pipeline = PMMLPipeline([
  ("mapper", mapper),
  ("estimator", clf)
])

pipeline.fit(X,y.values.reshape(-1,))

sklearn2pmml(pipeline, "xgb_V1.pmml", with_repr = True)

The pipeline fits to the data, generates a score and prediction with pipeline.score(X,y) and pipeline.predict(X), but when I try to write it to pmml, I get the following error:

Standard output is empty
Standard error:
Feb 21, 2020 1:53:30 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Feb 21, 2020 1:53:30 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 47 ms.
Feb 21, 2020 1:53:30 PM org.jpmml.sklearn.Main run
INFO: Converting..
Feb 21, 2020 1:53:30 PM sklearn2pmml.pipeline.PMMLPipeline initTargetFields
WARNING: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.target_fields' is not set. Assuming y as the name of the target field
Feb 21, 2020 1:53:30 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class xgboost.compat.XGBoostLabelEncoder)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
    at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:82)
    at sklearn.LabelEncoderClassifier.getLabelEncoder(LabelEncoderClassifier.java:40)
    at sklearn.LabelEncoderClassifier.getClasses(LabelEncoderClassifier.java:34)
    at sklearn.ClassifierUtil.getClasses(ClassifierUtil.java:32)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:133)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.preprocessing.LabelEncoder
    at java.lang.Class.cast(Class.java:3369)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
    ... 7 more

Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class xgboost.compat.XGBoostLabelEncoder)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
    at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:82)
    at sklearn.LabelEncoderClassifier.getLabelEncoder(LabelEncoderClassifier.java:40)
    at sklearn.LabelEncoderClassifier.getClasses(LabelEncoderClassifier.java:34)
    at sklearn.ClassifierUtil.getClasses(ClassifierUtil.java:32)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:133)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.preprocessing.LabelEncoder
    at java.lang.Class.cast(Class.java:3369)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)

Any idea what's causing this issue? Thanks

vruusmann commented 4 years ago

Any idea what's causing this issue?

There is a valid technical answer here: https://stackoverflow.com/a/60341839

In brief, XGBoost version 1.0.0 appears to introduce many breaking changes. First, it has switched from the standard sklearn.preprocessing.LabelEncoder class to a proprietary xgboost.compat.XGBoostLabelEncoder. Second, it has messed up the "reserved bytes" area of the XGBoost binary file format so that the JPMML-XGBoost library refuses to process it anyway.

The solution is to downgrade from XGBoost 1.0.0 to some 0.9.X version.