Closed ghost closed 5 years ago
FYI - you can "quote" blocks of code by using triple backticks.
Regarding this issue, then it's impossible to analyze or fix this issue without having access to the problematic pickle file. It's probably some OS/pickle library specific problem.
The JPMML-SkLearn library is depending on the latest Pyrolite library version, and if Pyrolite is unable to parse a Pickle file, then there's nothing that I can do about it.
Is this the same situation?
from sklearn import datasets
from sklearn import tree
from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline
import numpy
iris = datasets.load_iris()
X = iris.data
Y = iris.target
model = tree.DecisionTreeClassifier()
pipeline = PMMLPipeline([
('DecisionTreeClassifier', model)
])
pipeline.active_fields = numpy.array(iris.feature_names)
pipeline.target_fields = numpy.array('Species')
pipeline.fit(X, Y)
sklearn2pmml(pipeline, 'DecisionTreeClassifier.pmml')
SEVERE: Failed to parse PKL
net.razorvine.pickle.InvalidOpcodeException: invalid pickle opcode: 254
at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:355)
at org.jpmml.sklearn.PickleUtil$1.dispatch(PickleUtil.java:77)
at net.razorvine.pickle.Unpickler.load(Unpickler.java:122)
at org.jpmml.sklearn.PickleUtil.unpickle(PickleUtil.java:98)
at org.jpmml.sklearn.Main.run(Main.java:104)
at org.jpmml.sklearn.Main.main(Main.java:94)
The first exception complains about pickle opcode 219, whereas the second one complains about 254
. Even though the opcode is different, I suspect they both refer to the same problem - your Pickle and/or Python setup is broken in some way, and the sklearn.externals.joblib.dump()
function is generating broken Pickle files.
Can you unpickle this same file in Python?
Hi @vruusmann ,
I can use joblib dump/load in Python.
from sklearn import datasets
from sklearn import tree
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn.externals import joblib
import numpy
iris = datasets.load_iris()
X = iris.data
Y = iris.target
pipeline = PMMLPipeline([
("DecisionTreeClassifier", tree.DecisionTreeClassifier())
])
pipeline.active_fields = numpy.array(iris.feature_names)
pipeline.target_fields = numpy.array('Species')
pipeline.fit(X, Y)
dumpFile = "DecisionTreeClassifier-estimator.joblib"
joblib.dump(pipeline, dumpFile)
model2 = joblib.load(dumpFile)
model2.predict(X)
Just FYI. Same error happened in VotingClassifier
like KNeighborsClassifier
. The process complained at invalid pickle opcode: 219
from sklearn import datasets
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline
import numpy
iris = datasets.load_iris()
X = iris.data
Y = iris.target
clf1 = LogisticRegression(solver='lbfgs',multi_class='ovr',random_state=0)
clf2 = GaussianNB()
clf3 = KNeighborsClassifier(n_neighbors=7)
model = VotingClassifier(estimators=[('lr', clf1), ('gnb', clf2), ('knn', clf3)], voting='hard')
target = 'Species'
pipeline = PMMLPipeline([
("VotingClassifier", model)
])
pipeline.active_fields = numpy.array(iris.feature_names)
pipeline.target_fields = numpy.array(target)
clf1.fit(X, Y)
clf2.fit(X, Y)
clf3.fit(X, Y)
pipeline.fit(X, Y)
sklearn2pmml(pipeline, 'VotingClassifier.pmml')
Standard output is empty
Traceback (most recent call last):
File "VotingClassifier.py", line 43, in <module>
Standard error:
Apr 08, 2019 10:24:15 AM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Apr 08, 2019 10:24:15 AM org.jpmml.sklearn.Main run
SEVERE: Failed to parse PKL
net.razorvine.pickle.InvalidOpcodeException: invalid pickle opcode: 219
at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:355)
at org.jpmml.sklearn.PickleUtil$1.dispatch(PickleUtil.java:77)
at net.razorvine.pickle.Unpickler.load(Unpickler.java:122)
at org.jpmml.sklearn.PickleUtil.unpickle(PickleUtil.java:98)
at org.jpmml.sklearn.Main.run(Main.java:104)
at org.jpmml.sklearn.Main.main(Main.java:94)
Exception in thread "main" net.razorvine.pickle.InvalidOpcodeException: invalid pickle opcode: 219
at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:355)
at org.jpmml.sklearn.PickleUtil$1.dispatch(PickleUtil.java:77)
at net.razorvine.pickle.Unpickler.load(Unpickler.java:122)
at org.jpmml.sklearn.PickleUtil.unpickle(PickleUtil.java:98)
at org.jpmml.sklearn.Main.run(Main.java:104)
at org.jpmml.sklearn.Main.main(Main.java:94)
sklearn2pmml(pipeline, relativeOutputPath + model_type + '.pmml')
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn2pmml\__init__.py", line 252, in sklearn2pmml
raise RuntimeError("The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams")
RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams
If you can't successfully complete the simplest exercise - training a decision tree classifier for the iris dataset - then there's no point in trying anything more complicated.
Anyway, I maintain my original position that there's something wrong with the way how your Pickle/Scikit-Learn/Python/OS/Architecture is saving pickle files (they are corrupt, as indicated the net.razorvine.pickle.InvalidOpcodeException
type).
If it was a global problem, then there would be one hundred high priority issues raised in this issue tracker right now. But there's only this one.
Hi @vruusmann ,
I just would like to clarify that the problem is not happening in pure python3 joblib dump and load. Btw, thanks for your help.
Below precedures are working fine.
joblib.dump(pipeline, dumpFile)
model2 = joblib.load(dumpFile)
Hi @vruusmann ,
After uninstalling python 3.7.2 (was downloaded from https://www.python.org/downloads/) and installing Anaconda 4.6.11 (using 3.7.3). The pmml could be generated correctly.
Sorry for my previous comment. #146 is not an issue on my end. I will also verify #148 and close if it is caused by the same situation.
Thank you!
Hi,
I got an exception while trying to export pmml from
KNeighborsClassifier.
versions: Java 1.8.0_191-b12 Python 3.7.2 PIP packages lxml 4.3.3 numpy 1.16.2 pandas 0.24.2 patsy 0.5.1 pip 19.0.3 python-dateutil 2.8.0 pytz 2018.9 scikit-learn 0.20.3 scipy 1.2.1 setuptools 40.9.0 six 1.12.0 sklearn 0.0 sklearn-pandas 1.8.0 sklearn2pmml 0.44.0 statsmodels 0.9.0
scripts & error: