jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

(Python class sklearn.preprocessing.data.Normalizer) is not a supported Transformer #128

Closed siriJR closed 4 years ago

siriJR commented 4 years ago

Hello,My problems are as follows:

Failed to convert
java.lang.IllegalArgumentException: The value object (Python class sklearn.preprocessing.data.Normalizer) is not a supported Transformer
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
    at com.google.common.collect.Lists$TransformingRandomAccessList$1.transform(Lists.java:612)
    at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
    at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:72)
    at sklearn.Initializer.encodeFeatures(Initializer.java:41)
    at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
    at sklearn.Composite.encodeFeatures(Composite.java:129)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:208)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer
    at java.lang.Class.cast(Class.java:3369)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
    ... 9 more

Exception in thread "main" java.lang.IllegalArgumentException: The value object (Python class sklearn.preprocessing.data.Normalizer) is not a supported Transformer
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
    at com.google.common.collect.Lists$TransformingRandomAccessList$1.transform(Lists.java:612)
    at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
    at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:72)
    at sklearn.Initializer.encodeFeatures(Initializer.java:41)
    at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
    at sklearn.Composite.encodeFeatures(Composite.java:129)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:208)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer
    at java.lang.Class.cast(Class.java:3369)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
    ... 9 more

RuntimeError                              Traceback (most recent call last)
<ipython-input-18-de7aeb2a07c7> in <module>
----> 1 sklearn2pmml(pipeline, "./GBDT+LR3.pmml")

/mnt/lujiren/.pylib/3/sklearn2pmml/__init__.py in sklearn2pmml(pipeline, pmml, user_classpath, with_repr, with_jar, debug, java_encoding)
    263                                 print("Standard error is empty")
    264                 if retcode:
--> 265                         raise RuntimeError("The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams")
    266         finally:
    267                 if debug:

RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams

and my codes :

#%% 
from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import LabelBinarizer,StandardScaler,Normalizer,OneHotEncoder
from sklearn2pmml.decoration import CategoricalDomain, ContinuousDomain
from sklearn2pmml.ensemble import GBDTLRClassifier
from sklearn2pmml.pipeline import PMMLPipeline

def make_fit_gbdtlr(gbdt, lr, cat_columns, cont_columns, label_column, df_data):
    mapper = DataFrameMapper(
        [([cat_column], [CategoricalDomain(missing_value_replacement="DEFAULT", invalid_value_treatment="as_missing",
                                           missing_value_treatment="as_median"), OneHotEncoder()]) for cat_column in cat_columns] +

        [([cont_column], [ContinuousDomain(), Normalizer()]) for cont_column in cont_columns]
    )
    classifier = GBDTLRClassifier(gbdt, lr)
    pipeline = PMMLPipeline([ ("mapper", mapper),("classifier", classifier)])
    pipeline.fit(df_data[cat_columns + cont_columns], df_data[label_column])
    return pipeline

%%

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn2pmml import sklearn2pmml
gbdt = GradientBoostingClassifier(n_estimators=300, max_depth=5)
lr = LogisticRegression(max_iter=100)
pipeline = make_fit_gbdtlr(gbdt, lr, col_cate, col_num, col_label, df_train_1)

When I switch:

sklearn2pmml(pipeline, "./GBDT+LR3.pmml")

and my package version:

import sklearn, sklearn.externals.joblib, sklearn_pandas, sklearn2pmml
print(sklearn.__version__)
print(sklearn.externals.joblib.__version__)
print(sklearn_pandas.__version__)
print(sklearn2pmml.__version__)

--

0.20.0
0.12.5
1.8.0
0.52.1
vruusmann commented 4 years ago

Exact duplicate of https://github.com/jpmml/jpmml-sklearn/issues/64

Workaround - there is no need to normalize continuous values when using decision tree-based learning methods.