jpmml / sklearn2pmml

Python library for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
686 stars 113 forks source link

returned non-zero exit status 1 when using GridSearchCV #41

Closed vivekk0903 closed 7 years ago

vivekk0903 commented 7 years ago

I am getting the the "returned non-zero exit status 1" error with the new version 0.17 sklearn2pmml, when using it with GridSearchCV.

Version info

('python: ', '2.7.6') ('sklearn: ', '0.18.1') ('sklearn.externals.joblib:', '0.10.3') ('pandas: ', u'0.19.2') ('sklearn_pandas: ', '1.3.0') ('sklearn2pmml: ', '0.17.0')

Code to reproduce

1) Working correctly:

from sklearn.datasets import load_boston
boston_data = load_boston()
X = boston_data.data
y = boston_data.target

from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import GridSearchCV
from sklearn2pmml import PMMLPipeline
from sklearn2pmml import sklearn2pmml

knn_pipe = PMMLPipeline([
("regressor", KNeighborsRegressor())
])

knn_pipe.fit(X,y)
sklearn2pmml(knn_pipe, ".../SimpleFit.pmml", with_repr = True, debug = True)

2) Throwing error:

from sklearn.datasets import load_boston
boston_data = load_boston()
X = boston_data.data
y = boston_data.target

from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import GridSearchCV
from sklearn2pmml import PMMLPipeline
from sklearn2pmml import sklearn2pmml

knn_pipe = PMMLPipeline([
("regressor", KNeighborsRegressor())
])

param_grid = {"regressor__n_neighbors": [3, 2,10],
          "regressor__weights": ["uniform","distance"],
          "regressor__algorithm": ["auto", "ball_tree", "kd_tree"]}
cv = GridSearchCV(knn_pipe, param_grid=param_grid)
cv.fit(X,y)

Using the following line gives "TypeError: The pipeline object is not an instance of PMMLPipeline" which is understandable.

sklearn2pmml(cv, ".../GridSearchFit.pmml", with_repr = True, debug = True)

So I tried using cv.bestestimator in it, but it throws the "returned non-zero exit status 1" error.

sklearn2pmml(cv.best_estimator_, ".../GridSearchFit.pmml", with_repr = True, debug = True)

Stack trace of error:

('python: ', '2.7.6')
('sklearn: ', '0.18.1')
('sklearn.externals.joblib:', '0.10.3')
('pandas: ', u'0.19.2')
('sklearn_pandas: ', '1.3.0')
('sklearn2pmml: ', '0.17.0')
java -cp /usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/slf4j-api-1.7.22.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-schema-1.3.4.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-model-metro-1.3.4.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pyrolite-4.16.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-agent-1.3.4.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jcommander-1.48.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-sklearn-1.2.6.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/guava-19.0.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/slf4j-jdk14-1.7.22.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-converter-1.2.1.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/istack-commons-runtime-2.21.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jaxb-runtime-2.2.11.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/serpent-1.16.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-lightgbm-1.0.2.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jaxb-core-2.2.11.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-xgboost-1.1.5.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-model-1.3.4.jar org.jpmml.sklearn.Main --pkl-pipeline-input /tmp/pipeline-yd1bTD.pkl.z --repr-pipeline PMMLPipeline(steps=[('regressor', KNeighborsRegressor(algorithm='auto', leaf_size=30, metric='minkowski',
          metric_params=None, n_jobs=1, n_neighbors=10, p=2,
          weights='distance'))]) --pmml-output /home/.../GridSearchFit.pmml
('Preserved joblib dump file(s): ', '/tmp/pipeline-yd1bTD.pkl.z')
Traceback (most recent call last):

  File "<ipython-input-12-b7a0923021e7>", line 1, in <module>
    sklearn2pmml(cv.best_estimator_, "/home/.../GridSearchFit.pmml", with_repr = True, debug = True)

  File "/usr/local/lib/python2.7/dist-packages/sklearn2pmml/__init__.py", line 132, in sklearn2pmml
    subprocess.check_call(cmd)

  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)

CalledProcessError: Command '['java', '-cp', '/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/slf4j-api-1.7.22.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-schema-1.3.4.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-model-metro-1.3.4.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pyrolite-4.16.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-agent-1.3.4.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jcommander-1.48.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-sklearn-1.2.6.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/guava-19.0.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/slf4j-jdk14-1.7.22.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-converter-1.2.1.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/istack-commons-runtime-2.21.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jaxb-runtime-2.2.11.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/serpent-1.16.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-lightgbm-1.0.2.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jaxb-core-2.2.11.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-xgboost-1.1.5.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-model-1.3.4.jar', 'org.jpmml.sklearn.Main', '--pkl-pipeline-input', '/tmp/pipeline-yd1bTD.pkl.z', '--repr-pipeline', "PMMLPipeline(steps=[('regressor', KNeighborsRegressor(algorithm='auto', leaf_size=30, metric='minkowski',\n          metric_params=None, n_jobs=1, n_neighbors=10, p=2,\n          weights='distance'))])", '--pmml-output', '/home/.../GridSearchFit.pmml']' returned non-zero exit status 1

Here is the pickle saved file for this error. I have renamed it from Grid_pipeline-yd1bTD.pkl.z to Grid_pipeline-yd1bTD.pkl.zip to be able to upload here. Grid_pipeline-yd1bTD.pkl.zip

vruusmann commented 7 years ago

SkLearn2PMML is a lightweight Python wrapper for the JPMML-SkLearn library. This issue is about a deeper technical matter, so it will be analyzed here: https://github.com/jpmml/jpmml-sklearn/issues/42