Closed palaiya closed 7 years ago
The JPMML-SkLearn library only deals with Scikit-Learn's "core" model and transformation types. You're trying to use it to convert a scikit-neuralnetwork's model object. Despite the common name prefix, Scikit-Learn and Scikit-NeuralNetwork are not related in any way.
Why did you choose the sknn.mlp.Classifier
model type? How is it better than the sklearn.neural_network.multilayer_perceptron.MLPClassifier
model type for your use case?
The good news is that the JPMML-SkLearn library can be extended to support arbitrary Python-based machine learning frameworks (as demonstrated by integrating 3rd-party XGBoost and LightGBM libraries). Better yet, starting from JPMML-SkLearn version 1.2.0, there's a dedicated Java API for implementing and registering custom converters with the JPMML-SkLearn runtime.
@vruusmann I have also tried with Scikit-Learn's "core" model using below code. And now it gives the following error. Could you please help me with this.
import pickle
import numpy as np
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
f = open("TrainLSDataset.csv")
data = np.loadtxt(f,delimiter = ',')
x = data[:, 1:]
y = data[:, 0]
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3)
clf = MLPClassifier(solver='lbfgs', alpha=1e-5,
hidden_layer_sizes=(5), random_state=1, max_iter=100)
clf.fit(X_train, y_train)
filename = 'finalized_model.pkl'
pickle.dump(clf, open(filename, 'wb'))
loaded_model = pickle.load(open(filename, 'rb'))
result = loaded_model.score(X_test, y_test)
print(loaded_model)
print(result)
Error :
Jan 31, 2017 2:47:21 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Jan 31, 2017 2:47:21 PM org.jpmml.sklearn.Main run
SEVERE: Failed to parse PKL
net.razorvine.pickle.PickleException: invalid escape sequence in string
at net.razorvine.pickle.PickleUtils.decode_escaped(PickleUtils.java:344)
at net.razorvine.pickle.Unpickler.load_string(Unpickler.java:448)
at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:179)
at org.jpmml.sklearn.PickleUtil$1.dispatch(PickleUtil.java:136)
at net.razorvine.pickle.Unpickler.load(Unpickler.java:100)
at org.jpmml.sklearn.PickleUtil.unpickle(PickleUtil.java:157)
at org.jpmml.sklearn.Main.run(Main.java:111)
at org.jpmml.sklearn.Main.main(Main.java:99)
Exception in thread "main" net.razorvine.pickle.PickleException: invalid escape sequence in string
at net.razorvine.pickle.PickleUtils.decode_escaped(PickleUtils.java:344)
at net.razorvine.pickle.Unpickler.load_string(Unpickler.java:448)
at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:179)
at org.jpmml.sklearn.PickleUtil$1.dispatch(PickleUtil.java:136)
at net.razorvine.pickle.Unpickler.load(Unpickler.java:100)
at org.jpmml.sklearn.PickleUtil.unpickle(PickleUtil.java:157)
at org.jpmml.sklearn.Main.run(Main.java:111)
at org.jpmml.sklearn.Main.main(Main.java:99)
The JPMML-SkLearn library depends on the Pyrolite library for most of its low-level unpickling functionality. Your PickleException
originates from the Pyrolite library (method PickleUtils#decode_escaped(...)
), which means that your MLPClassifier
object contains strangely encoded strings.
What are your class label values (attribute MLPClassifier.classes_
)? Does this PickleException
go away if your replace them (at least temporarily) with integer-labels "0", "1", .., "n"? Can you share your finalized_model.pkl
file with me so that I could debug it locally?
Anyway, here's a working example: https://github.com/jpmml/jpmml-sklearn/blob/master/src/test/resources/main.py#L234 https://github.com/jpmml/jpmml-sklearn/blob/master/src/test/resources/main.py#L206
My Label contains only 1s and 0s. Though I am attaching my finalized_model.pkl
(renamed it to .txt extension) file.
I also tried to replace the pickle.dump
with joblib.dump
. But after that it gives the following error:
Error:
Jan 31, 2017 4:49:30 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Jan 31, 2017 4:49:30 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 87 ms.
Exception in thread "main" java.lang.IllegalArgumentException: The object (Python class sklearn.neural_network.multilayer_perceptron.MLPClassifier) is not a PMMLPipeline
at org.jpmml.sklearn.Main.run(Main.java:122)
at org.jpmml.sklearn.Main.main(Main.java:99)
By JPMML-Sklearn conventions, the "entry point" Python object must be an instance of sklearn2pmml.PMMLPipeline
.
Simply wrap your MLPClassifier object like this:
from sklearn2pmml import PMMLPipeline
pipeline = PMMLPipeline([
('clf', clf)
])
And after installing the sklearn2pmml
package, you can use the utility method sklearn2pmml.sklearn2pmml
to perform the conversion:
from sklearn2pmml import sklearn2pmml
sklearn2pmml(pipeline, "pipeline.pmml")
Now, I am getting this strange error despite being onubuntu 14.04
Error:
Traceback (most recent call last):
File "LeadScore.py", line 50, in <module>
sklearn2pmml(pipeline, "pipeline.pmml")
File "/home/naresh/Desktop/Work/Spark-CassandraWork/MachineLearning/sklearn2pmml.py", line 120, in sklearn2pmml
cmd = ["java", "-cp", os.pathsep.join(_package_classpath() + user_classpath), "org.jpmml.sklearn.Main"]
File "/home/naresh/Desktop/Work/Spark-CassandraWork/MachineLearning/sklearn2pmml.py", line 75, in _package_classpath
resources = pkg_resources.resource_listdir("sklearn2pmml.resources", "")
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 953, in resource_listdir
return get_provider(package_or_requirement).resource_listdir(
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 227, in get_provider
__import__(moduleOrReq)
ImportError: No module named resources
Ubuntu 14 is fairly outdated. Could you paste the output of sklearn2pmml(.., debug = True)
here so that I could see the complete configuration of your Python runtime.
Also, could you re-try with Python 3.4+?
('python: ', '2.7.6')
('sklearn: ', '0.18.1')
('sklearn.externals.joblib:', '0.10.3')
('pandas: ', u'0.19.1')
('sklearn_pandas: ', '1.3.0')
('sklearn2pmml: ', '0.16.0')
Traceback (most recent call last):
File "LeadScore.py", line 50, in <module>
sklearn2pmml(pipeline, "pipeline.pmml",debug = True)
File "/home/naresh/Desktop/Work/Spark-CassandraWork/MachineLearning/sklearn2pmml.py", line 120, in sklearn2pmml
cmd = ["java", "-cp", os.pathsep.join(_package_classpath() + user_classpath), "org.jpmml.sklearn.Main"]
File "/home/naresh/Desktop/Work/Spark-CassandraWork/MachineLearning/sklearn2pmml.py", line 75, in _package_classpath
resources = pkg_resources.resource_listdir("sklearn2pmml.resources", "")
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 953, in resource_listdir
return get_provider(package_or_requirement).resource_listdir(
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 227, in get_provider
__import__(moduleOrReq)
ImportError: No module named resources
You should update your Python version!
The first PickleException
is caused by the fact that your Python version creates pickle files using outdated/very inefficient protocol version: https://github.com/irmen/Pyrolite/issues/51#issuecomment-276471871 The second ImportError
is caused by the fact that your Python version does not know how to properly load package resources.
I've tested the sklearn2pmml
package with Python versions 2.7.9 through 2.7.11, and they work without problems.
@vruusmann I have updated my python version to 2.7.9 as you suggested but still i am getting the same error.
After that I have also tried with python version 3.4.5 and getting the similar error. Kindly look at it:
@vruusmann just to be fair, the fact that this protocol level 0 pickle file was passed through Pyrolite did expose the bug about the unrecognised escape character. For that I'm happy because this is now fixed in the next release of Pyrolite
I am using the wrapper of scikit-learn Multilayer Perceptron in Python scikit-neuralnetwork to train the neural network and save it to a file. Now, I want to expose it on production to predict in real time. So, I was thinking to use Java/Golang for better concurrency than Python. Hence, my question is how do I read the model using this library written using Python or above wrapper? The code below I am using for training the model and last three lines I want to port to Java/GoLang to expose it on production
I tried using the following command to convert the model written using above code to pmml. But it gives the following error. Could you please tell me what i am doing wrong here.
Also, can you share any link where model is trained using python and used in Java/Golang to score or predict.
java -jar target/converter-executable-1.2-SNAPSHOT.jar --pkl-input finalized_model.pkl --pmml-output finalized_model.pmml