Closed vivekk0903 closed 2 years ago
Sorry, I meant to post it on sklearn2pmml page, but by mistake posted it here. Sorry again.
('sklearn2pmml: ', '0.17.0')
That's a fairly outdated version.
Please upgrade to latest version of SkLearn2PMML, which is 0.20.3 at the moment.
cv = GridSearchCV(knn_pipe, param_grid=param_grid)
sklearn2pmml(cv.best_estimator_, "GridSearchFit.pmml", with_repr = True)
The sklearn2pmml()
function requires the first argument to be an instance of sklearn2pmml.PMMLPipeline
. After fitting a GridSearchCV
meta-model, then you should construct a dummy PMMLPipeline
simply like this:
cv = GridSearchCV(...)
cv.fit(X, y)
pipeline = PMMLPipeline([
("best_estimator", cv.best_estimator_)
])
# Additionally, set feature and label names
pipeline.active_fields = X.columns.values
pipeline.target_field = y.name
sklearn2pmml(pipeline, ...)
I tried converting the attached Pickle file with JPMML-SkLearn command-line application, and got the following result:
$ java -jar target/converter-executable-1.3-SNAPSHOT.jar --pkl-input Grid_pipeline-yd1bTD.pkl.z --pmml-output Grid_pipeline.pmml
juuni 20, 2017 10:18:39 AM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
juuni 20, 2017 10:18:40 AM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 77 ms.
juuni 20, 2017 10:18:40 AM org.jpmml.sklearn.Main run
INFO: Converting..
juuni 20, 2017 10:18:40 AM sklearn2pmml.PMMLPipeline encodePMML
WARNING: The 'target_field' attribute is not set. Assuming y as the name of the target field
juuni 20, 2017 10:18:40 AM sklearn2pmml.PMMLPipeline initFeatures
WARNING: The 'active_fields' attribute is not set. Assuming [x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13] as the names of active fields
juuni 20, 2017 10:18:40 AM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: distance
at sklearn.neighbors.KNeighborsUtil.encodeNeighbors(KNeighborsUtil.java:127)
at sklearn.neighbors.KNeighborsRegressor.encodeModel(KNeighborsRegressor.java:56)
at sklearn.neighbors.KNeighborsRegressor.encodeModel(KNeighborsRegressor.java:31)
at sklearn.Estimator.encodeModel(Estimator.java:46)
at sklearn2pmml.PMMLPipeline.encodePMML(PMMLPipeline.java:136)
at org.jpmml.sklearn.Main.run(Main.java:144)
at org.jpmml.sklearn.Main.main(Main.java:93)
Exception in thread "main" java.lang.IllegalArgumentException: distance
at sklearn.neighbors.KNeighborsUtil.encodeNeighbors(KNeighborsUtil.java:127)
at sklearn.neighbors.KNeighborsRegressor.encodeModel(KNeighborsRegressor.java:56)
at sklearn.neighbors.KNeighborsRegressor.encodeModel(KNeighborsRegressor.java:31)
at sklearn.Estimator.encodeModel(Estimator.java:46)
at sklearn2pmml.PMMLPipeline.encodePMML(PMMLPipeline.java:136)
at org.jpmml.sklearn.Main.run(Main.java:144)
at org.jpmml.sklearn.Main.main(Main.java:93)
In brief, the problem is that your KNeighborsRegressor
model object uses distance
weight function, which is currently not supported. You should fall back to the uniform
distance function (this is SkLearn's default):
regressor = KNeighborsRegressor(..., weights = "uniform")
The distance
weight function can be represented in PMML for the most part. IIRC, there only needs to be a special handler for the zero distance.
Ok, I have followed as you said.
1) Updated to newest available version.
Had actually upgraded the version before posting the issue, but was using the command pip install --user git+https://github.com/jpmml/sklearn2pmml.git
. But I overlooked the '--user' option and was testing on another user.
Now the error message has become a bit more clear.
('python: ', '2.7.6')
('sklearn: ', '0.18.1')
('sklearn.externals.joblib:', '0.10.3')
('pandas: ', u'0.19.2')
('sklearn_pandas: ', '1.3.0')
('sklearn2pmml: ', '0.20.3')
java -cp /usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/guava-20.0.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/slf4j-api-1.7.22.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-schema-1.3.4.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-model-metro-1.3.4.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pyrolite-4.16.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-agent-1.3.4.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/slf4j-api-1.7.25.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-schema-1.3.6.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-converter-1.2.3.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-sklearn-1.3.3.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pyrolite-4.19.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/slf4j-jdk14-1.7.25.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jcommander-1.48.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-sklearn-1.2.6.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/guava-19.0.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-agent-1.3.6.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-lightgbm-1.0.7.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/slf4j-jdk14-1.7.22.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-converter-1.2.1.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/istack-commons-runtime-2.21.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jaxb-runtime-2.2.11.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/serpent-1.18.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/serpent-1.16.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-model-1.3.6.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-lightgbm-1.0.2.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-xgboost-1.1.7.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jaxb-core-2.2.11.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/jpmml-xgboost-1.1.5.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-model-1.3.4.jar:/usr/local/lib/python2.7/dist-packages/sklearn2pmml/resources/pmml-model-metro-1.3.6.jar org.jpmml.sklearn.Main --pkl-pipeline-input /tmp/pipeline-QbjihM.pkl.z --pmml-output /home/local/EZDI/vivek.k/GridSearchFit.pmml
('Preserved joblib dump file(s): ', '/tmp/pipeline-QbjihM.pkl.z')
Traceback (most recent call last):
File "<ipython-input-4-ab9a7c1ff136>", line 7, in <module>
sklearn2pmml(cv.best_estimator_, "/home/local/EZDI/vivek.k/GridSearchFit.pmml", with_repr = True, debug = True)
File "/usr/local/lib/python2.7/dist-packages/sklearn2pmml/__init__.py", line 142, in sklearn2pmml
raise RuntimeError("The JPMML-SkLearn conversion application has failed. The Java process should have printed more information about the failure into its standard output and/or error streams")
RuntimeError: The JPMML-SkLearn conversion application has failed. The Java process should have printed more information about the failure into its standard output and/or error streams
2) Why is there a need to wrap the cv.best_estimator_
again inside a PMMLPipeline?
The cv.best_estimator_
is an instance of PMMLPipeline. When I check the type(cv.best_estimator_)
, its returning as sklearn2pmml.PMMLPipeline
. So I dont think it should be necessary to wrap it again. because when I removed the "distance" from the parameters to grid-search, I am getting no errors in using cv.best_estimator_
inside the sklearn2pmml
command.
Why is there a need to wrap the cv.bestestimator again inside a PMMLPipeline?
Very interesting - I didn't known that GridSearchCV
can take a pipeline as the first argument. I was assuming that you were using a "raw" estimator as the first argument, and wanted to know how to wrap the cv.best_estimator_
to make it acceptable for the sklearn2pmml()
function.
Probably got misled by this StackOverflow thread: https://stackoverflow.com/questions/44643123
Indeed, if the cv.best_estimator_
is already an sklearn2pmml.PMMLPipeline
, then there's no need to re-wrap it.
That was me who gave the suggestion to directly use the cv.best_estimator_
in the sklearn2pmml command, before I tested it out myself and found this issue.
The single answer on that page is written after I put the issue here, and using your recommendation to wrap it again. So I dont know who is following whom here. :p
So the only issue that remains here is to add support of weight function "distance".
I am getting the the "returned non-zero exit status 1" error with the new version 0.17 sklearn2pmml, when using it with GridSearchCV.
Version info
('python: ', '2.7.6') ('sklearn: ', '0.18.1') ('sklearn.externals.joblib:', '0.10.3') ('pandas: ', u'0.19.2') ('sklearn_pandas: ', '1.3.0') ('sklearn2pmml: ', '0.17.0')
Code to reproduce
1) Working correctly:
2) Throwing error:
Using the following line gives "TypeError: The pipeline object is not an instance of PMMLPipeline" which is understandable.
sklearn2pmml(cv, ".../GridSearchFit.pmml", with_repr = True, debug = True)
So I tried using cv.bestestimator in it, but it throws the "returned non-zero exit status 1" error.
sklearn2pmml(cv.best_estimator_, ".../GridSearchFit.pmml", with_repr = True, debug = True)
Stack trace of error:
Here is the pickle saved file for this error. I have renamed it from
Grid_pipeline-yd1bTD.pkl.z
toGrid_pipeline-yd1bTD.pkl.zip
to be able to upload here. Grid_pipeline-yd1bTD.pkl.zip