jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

convert a TPOT pipeline to PMML format: Standard output is empty? #147

Closed MONTYYUAN closed 4 years ago

MONTYYUAN commented 4 years ago

convert a TPOT pipeline to PMML format.

PMMLPipeline(steps=[('selectpercentile', SelectorProxy(selector=SelectPercentile(percentile=74))), ('mlpclassifier', MLPClassifier(learning_rate_init=0.01, random_state=13))])

And the error showed as follows: Standard output is empty Standard error: 03, 2020 11:37:50 org.jpmml.sklearn.Main run Ϣ: Parsing PKL.. 03, 2020 11:37:50 org.jpmml.sklearn.Main run Ϣ: Parsed PKL in 169 ms. 03, 2020 11:37:50 org.jpmml.sklearn.Main run Ϣ: Converting.. 03, 2020 11:37:50 org.jpmml.sklearn.Main run : Failed to convert java.lang.IllegalArgumentException: The transformer object of the first step (Python class sklearn2pmml.SelectorProxy) does not specify feature type information at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:220) at org.jpmml.sklearn.Main.run(Main.java:228) at org.jpmml.sklearn.Main.main(Main.java:148) Caused by: java.lang.UnsupportedOperationException at sklearn.MultiTransformer.getOpType(MultiTransformer.java:33) at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:205) ... 2 more

Exception in thread "main" java.lang.IllegalArgumentException: The transformer object of the first step (Python class sklearn2pmml.SelectorProxy) does not specify feature type information at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:220) at org.jpmml.sklearn.Main.run(Main.java:228) at org.jpmml.sklearn.Main.main(Main.java:148) Caused by: java.lang.UnsupportedOperationException at sklearn.MultiTransformer.getOpType(MultiTransformer.java:33) at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:205) ... 2 more

What's wrong and how should i solve this problem ?

vruusmann commented 4 years ago

java.lang.IllegalArgumentException: The transformer object of the first step (Python class sklearn2pmml.SelectorProxy) does not specify feature type information

What's wrong and how should i solve this problem ?

The pipeline should not start with a transformer (in this case, selector) that does not specify the number and types of incoming features.

The PMML converter needs to know how many/which features are "coming in".

To fix the problem, use a domain decorator (as the first step of the pipeline) to make this information available: https://openscoring.io/blog/2020/02/23/sklearn_feature_specification_pmml/