jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

Generate Multiple target pmml failed #48

Closed NicoleLee918 closed 7 years ago

NicoleLee918 commented 7 years ago

Hi, I wanna to generate pmml which includes multiple targest. For RandomForestRegressor, it seems that generated pmml is single target. For MultiOutputRegressor, error "Failed to convert java.lang.IllegalArgumentException" pop up.

RandomForestRegressor example: Here is code: from sklearn import datasets from sklearn.datasets.base import Bunch import csv import numpy as np from time import time import pandas as pd import scipy

caseName = "6MultipleOutputRandomForestRegressor_conti" df = pd.read_csv("/D/AC/5.0/ScoringWithScikitLearn/Tests/data/employ_salary.csv",sep=",") test_X = df.iloc[:,4:7] test_y = df[['average_montly_hours','satisfaction_level']]

from sklearn2pmml import sklearn2pmml from sklearn2pmml import PMMLPipeline from sklearn_pandas import DataFrameMapper from sklearn2pmml.decoration import ContinuousDomain from sklearn.preprocessing import Imputer from sklearn.ensemble import RandomForestRegressor from sklearn.multioutput import MultiOutputRegressor

max_depth = 30 pipeline = PMMLPipeline([ ("mapper", DataFrameMapper([ (list(test_X.columns.values), [ContinuousDomain(), Imputer()]) ])), ("regression", RandomForestRegressor(max_depth=max_depth,random_state=0)), ])

pipeline.fit(test_X,test_y) sklearn2pmml(pipeline, "/D/AC/5.0/ScoringWithScikitLearn/Tests/out/"+caseName+"_PyPMML.xml", with_repr = True)

Expected behavior: multiple target field in MiningSchema in pmml Current behavior: only one target field in MiningSchema in pmml, like this,

MultiOutputRegressor example: Here is code (similar with RandomForest, just different model) ... pipeline = PMMLPipeline([ ("mapper", DataFrameMapper([ (list(test_X.columns.values), [ContinuousDomain(), Imputer()]) ])), ("regression", MultiOutputRegressor(RandomForestRegressor(max_depth=max_depth,random_state=0))), ]) ...

Here is the error: Aug 24, 2017 10:40:18 AM org.jpmml.sklearn.Main run INFO: Parsing PKL.. Aug 24, 2017 10:40:18 AM org.jpmml.sklearn.Main run INFO: Parsed PKL in 56 ms. Aug 24, 2017 10:40:18 AM org.jpmml.sklearn.Main run INFO: Converting.. Aug 24, 2017 10:40:18 AM org.jpmml.sklearn.Main run SEVERE: Failed to convert java.lang.IllegalArgumentException at sklearn2pmml.PMMLPipeline.encodePMML(PMMLPipeline.java:74) at org.jpmml.sklearn.Main.run(Main.java:144) at org.jpmml.sklearn.Main.main(Main.java:93)

Exception in thread "main" java.lang.IllegalArgumentException at sklearn2pmml.PMMLPipeline.encodePMML(PMMLPipeline.java:74) at org.jpmml.sklearn.Main.run(Main.java:144) at org.jpmml.sklearn.Main.main(Main.java:93) Traceback (most recent call last): File "6MultipleOutputReg_conti.py", line 40, in sklearn2pmml(pipeline, "/D/AC/5.0/ScoringWithScikitLearn/Tests/out/"+caseName+"_PyPMML.xml", with_repr = True) File "/Users/lihuaw/.local/lib/python2.7/site-packages/sklearn2pmml/init.py", line 142, in sklearn2pmml raise RuntimeError("The JPMML-SkLearn conversion application has failed. The Java process should have printed more information about the failure into its standard output and/or error streams") RuntimeError: The JPMML-SkLearn conversion application has failed. The Java process should have printed more information about the failure into its standard output and/or error streams

Note: I also tried KNeighborsClassifier/KNeighborsRegressor/MLPRegressor, all of them meet the same error with MultiOutputRegressor. I guess multiple target pmml are not supported now by sklearn2pmml, Am I right? Is there any plan to support this function? Please correct me if I'm wrong. Thanks a lot,

vruusmann commented 7 years ago

Closing as duplicate of https://github.com/jpmml/sklearn2pmml/issues/54