jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

Fix Imputer #46

Closed sabba closed 7 years ago

sabba commented 7 years ago

When calculating the number of features, an exception is raised accessing to a misspelled "statistics" attribute (missing suffix "_").

A sklearn pipeline like this raise the exception during the conversion to pmml

pipeline = PMMLPipeline([
    ('imputer', Imputer(missing_values='NaN',  axis=0, copy=False)), 
    ("classifier", GradientBoostingClassifier())
])
pipeline.fit(X, y)
sklearn2pmml(pipeline, "GBC.pmml", with_repr=True)
java.lang.IllegalArgumentException: The value of the sklearn.preprocessing.imputation.Imputer.statistics attribute (null) is not a supported array type
        at org.jpmml.sklearn.ClassDictUtil.getShape(ClassDictUtil.java:99)
        at org.jpmml.sklearn.ClassDictUtil.getShape(ClassDictUtil.java:76)
        at sklearn.preprocessing.Imputer.getStatisticsShape(Imputer.java:83)
        at sklearn.preprocessing.Imputer.getNumberOfFeatures(Imputer.java:39)
        at sklearn.pipeline.Pipeline.encodeFeatures(Pipeline.java:85)
        at sklearn2pmml.PMMLPipeline.encodePMML(PMMLPipeline.java:122)
        at org.jpmml.sklearn.Main.run(Main.java:144)
        at org.jpmml.sklearn.Main.main(Main.java:93)