jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

Support for scikit-neuralnetwork model types #27

Closed palaiya closed 7 years ago

palaiya commented 7 years ago

I am using the wrapper of scikit-learn Multilayer Perceptron in Python scikit-neuralnetwork to train the neural network and save it to a file. Now, I want to expose it on production to predict in real time. So, I was thinking to use Java/Golang for better concurrency than Python. Hence, my question is how do I read the model using this library written using Python or above wrapper? The code below I am using for training the model and last three lines I want to port to Java/GoLang to expose it on production

import pickle
import numpy as np
import pandas as pd
from sknn.mlp import Classifier, Layer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

f = open("TrainLSDataset.csv")
data = np.loadtxt(f,delimiter = ',')

x = data[:, 1:]
y = data[:, 0]
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3)

nn = Classifier(
    layers=[                    
        Layer("Rectifier", units=5),
        Layer("Softmax")],
    learning_rate=0.001,
    n_iter=100)

nn.fit(X_train, y_train)
filename = 'finalized_model.pkl'
pickle.dump(nn, open(filename, 'wb'))

**#Below code i want to write in GoLang for exposing it on Production** :
loaded_model = pickle.load(open(filename, 'rb'))
result = loaded_model.score(X_test, y_test)
y_pred = loaded_model.predict(X_test)

I tried using the following command to convert the model written using above code to pmml. But it gives the following error. Could you please tell me what i am doing wrong here.

Also, can you share any link where model is trained using python and used in Java/Golang to score or predict.

java -jar target/converter-executable-1.2-SNAPSHOT.jar --pkl-input finalized_model.pkl --pmml-output finalized_model.pmml

SEVERE: Failed to parse PKL
net.razorvine.pickle.PickleException: failed to reconstruct()
    at net.razorvine.pickle.objects.Reconstructor.construct(Reconstructor.java:22)
    at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:708)
    at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:176)
    at org.jpmml.sklearn.PickleUtil$1.dispatch(PickleUtil.java:136)
    at net.razorvine.pickle.Unpickler.load(Unpickler.java:100)
    at org.jpmml.sklearn.PickleUtil.unpickle(PickleUtil.java:157)
    at org.jpmml.sklearn.Main.run(Main.java:111)
    at org.jpmml.sklearn.Main.main(Main.java:99)
Caused by: java.lang.NoSuchMethodException: net.razorvine.pickle.objects.ClassDictConstructor.reconstruct(java.lang.Object, java.lang.Object)
    at java.lang.Class.getMethod(Class.java:1786)
    at net.razorvine.pickle.objects.Reconstructor.construct(Reconstructor.java:19)
    ... 7 more

Exception in thread "main" net.razorvine.pickle.PickleException: failed to reconstruct()
    at net.razorvine.pickle.objects.Reconstructor.construct(Reconstructor.java:22)
    at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:708)
    at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:176)
    at org.jpmml.sklearn.PickleUtil$1.dispatch(PickleUtil.java:136)
    at net.razorvine.pickle.Unpickler.load(Unpickler.java:100)
    at org.jpmml.sklearn.PickleUtil.unpickle(PickleUtil.java:157)
    at org.jpmml.sklearn.Main.run(Main.java:111)
    at org.jpmml.sklearn.Main.main(Main.java:99)
Caused by: java.lang.NoSuchMethodException: net.razorvine.pickle.objects.ClassDictConstructor.reconstruct(java.lang.Object, java.lang.Object)
    at java.lang.Class.getMethod(Class.java:1786)
    at net.razorvine.pickle.objects.Reconstructor.construct(Reconstructor.java:19)
    ... 7 more
vruusmann commented 7 years ago

The JPMML-SkLearn library only deals with Scikit-Learn's "core" model and transformation types. You're trying to use it to convert a scikit-neuralnetwork's model object. Despite the common name prefix, Scikit-Learn and Scikit-NeuralNetwork are not related in any way.

Why did you choose the sknn.mlp.Classifier model type? How is it better than the sklearn.neural_network.multilayer_perceptron.MLPClassifier model type for your use case?

The good news is that the JPMML-SkLearn library can be extended to support arbitrary Python-based machine learning frameworks (as demonstrated by integrating 3rd-party XGBoost and LightGBM libraries). Better yet, starting from JPMML-SkLearn version 1.2.0, there's a dedicated Java API for implementing and registering custom converters with the JPMML-SkLearn runtime.

palaiya commented 7 years ago

@vruusmann I have also tried with Scikit-Learn's "core" model using below code. And now it gives the following error. Could you please help me with this.

import pickle
import numpy as np
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

f = open("TrainLSDataset.csv")
data = np.loadtxt(f,delimiter = ',')

x = data[:, 1:]
y = data[:, 0]
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3)

clf = MLPClassifier(solver='lbfgs', alpha=1e-5,
                    hidden_layer_sizes=(5), random_state=1, max_iter=100)

clf.fit(X_train, y_train)

filename = 'finalized_model.pkl'
pickle.dump(clf, open(filename, 'wb'))

loaded_model = pickle.load(open(filename, 'rb'))
result = loaded_model.score(X_test, y_test)

print(loaded_model)
print(result)

Error :

Jan 31, 2017 2:47:21 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Jan 31, 2017 2:47:21 PM org.jpmml.sklearn.Main run
SEVERE: Failed to parse PKL
net.razorvine.pickle.PickleException: invalid escape sequence in string
    at net.razorvine.pickle.PickleUtils.decode_escaped(PickleUtils.java:344)
    at net.razorvine.pickle.Unpickler.load_string(Unpickler.java:448)
    at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:179)
    at org.jpmml.sklearn.PickleUtil$1.dispatch(PickleUtil.java:136)
    at net.razorvine.pickle.Unpickler.load(Unpickler.java:100)
    at org.jpmml.sklearn.PickleUtil.unpickle(PickleUtil.java:157)
    at org.jpmml.sklearn.Main.run(Main.java:111)
    at org.jpmml.sklearn.Main.main(Main.java:99)

Exception in thread "main" net.razorvine.pickle.PickleException: invalid escape sequence in string
    at net.razorvine.pickle.PickleUtils.decode_escaped(PickleUtils.java:344)
    at net.razorvine.pickle.Unpickler.load_string(Unpickler.java:448)
    at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:179)
    at org.jpmml.sklearn.PickleUtil$1.dispatch(PickleUtil.java:136)
    at net.razorvine.pickle.Unpickler.load(Unpickler.java:100)
    at org.jpmml.sklearn.PickleUtil.unpickle(PickleUtil.java:157)
    at org.jpmml.sklearn.Main.run(Main.java:111)
    at org.jpmml.sklearn.Main.main(Main.java:99)
vruusmann commented 7 years ago

The JPMML-SkLearn library depends on the Pyrolite library for most of its low-level unpickling functionality. Your PickleException originates from the Pyrolite library (method PickleUtils#decode_escaped(...)), which means that your MLPClassifier object contains strangely encoded strings.

What are your class label values (attribute MLPClassifier.classes_)? Does this PickleException go away if your replace them (at least temporarily) with integer-labels "0", "1", .., "n"? Can you share your finalized_model.pkl file with me so that I could debug it locally?

Anyway, here's a working example: https://github.com/jpmml/jpmml-sklearn/blob/master/src/test/resources/main.py#L234 https://github.com/jpmml/jpmml-sklearn/blob/master/src/test/resources/main.py#L206

palaiya commented 7 years ago

My Label contains only 1s and 0s. Though I am attaching my finalized_model.pkl (renamed it to .txt extension) file.

I also tried to replace the pickle.dump with joblib.dump. But after that it gives the following error:

Error:

Jan 31, 2017 4:49:30 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Jan 31, 2017 4:49:30 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 87 ms.
Exception in thread "main" java.lang.IllegalArgumentException: The object (Python class sklearn.neural_network.multilayer_perceptron.MLPClassifier) is not a PMMLPipeline
    at org.jpmml.sklearn.Main.run(Main.java:122)
    at org.jpmml.sklearn.Main.main(Main.java:99)

finalized_model.txt

vruusmann commented 7 years ago

By JPMML-Sklearn conventions, the "entry point" Python object must be an instance of sklearn2pmml.PMMLPipeline.

Simply wrap your MLPClassifier object like this:

from sklearn2pmml import PMMLPipeline

pipeline = PMMLPipeline([
  ('clf', clf)
])

And after installing the sklearn2pmml package, you can use the utility method sklearn2pmml.sklearn2pmml to perform the conversion:

from sklearn2pmml import sklearn2pmml

sklearn2pmml(pipeline, "pipeline.pmml")
palaiya commented 7 years ago

Now, I am getting this strange error despite being onubuntu 14.04

Error:

Traceback (most recent call last):
  File "LeadScore.py", line 50, in <module>
    sklearn2pmml(pipeline, "pipeline.pmml")
  File "/home/naresh/Desktop/Work/Spark-CassandraWork/MachineLearning/sklearn2pmml.py", line 120, in sklearn2pmml
    cmd = ["java", "-cp", os.pathsep.join(_package_classpath() + user_classpath), "org.jpmml.sklearn.Main"]
  File "/home/naresh/Desktop/Work/Spark-CassandraWork/MachineLearning/sklearn2pmml.py", line 75, in _package_classpath
    resources = pkg_resources.resource_listdir("sklearn2pmml.resources", "")
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 953, in resource_listdir
    return get_provider(package_or_requirement).resource_listdir(
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 227, in get_provider
    __import__(moduleOrReq)
ImportError: No module named resources
vruusmann commented 7 years ago

Ubuntu 14 is fairly outdated. Could you paste the output of sklearn2pmml(.., debug = True) here so that I could see the complete configuration of your Python runtime.

Also, could you re-try with Python 3.4+?

palaiya commented 7 years ago
('python: ', '2.7.6')
('sklearn: ', '0.18.1')
('sklearn.externals.joblib:', '0.10.3')
('pandas: ', u'0.19.1')
('sklearn_pandas: ', '1.3.0')
('sklearn2pmml: ', '0.16.0')
Traceback (most recent call last):
  File "LeadScore.py", line 50, in <module>
    sklearn2pmml(pipeline, "pipeline.pmml",debug = True)
  File "/home/naresh/Desktop/Work/Spark-CassandraWork/MachineLearning/sklearn2pmml.py", line 120, in sklearn2pmml
    cmd = ["java", "-cp", os.pathsep.join(_package_classpath() + user_classpath), "org.jpmml.sklearn.Main"]
  File "/home/naresh/Desktop/Work/Spark-CassandraWork/MachineLearning/sklearn2pmml.py", line 75, in _package_classpath
    resources = pkg_resources.resource_listdir("sklearn2pmml.resources", "")
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 953, in resource_listdir
    return get_provider(package_or_requirement).resource_listdir(
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 227, in get_provider
    __import__(moduleOrReq)
ImportError: No module named resources
vruusmann commented 7 years ago

You should update your Python version!

The first PickleException is caused by the fact that your Python version creates pickle files using outdated/very inefficient protocol version: https://github.com/irmen/Pyrolite/issues/51#issuecomment-276471871 The second ImportError is caused by the fact that your Python version does not know how to properly load package resources.

I've tested the sklearn2pmml package with Python versions 2.7.9 through 2.7.11, and they work without problems.

palaiya commented 7 years ago

@vruusmann I have updated my python version to 2.7.9 as you suggested but still i am getting the same error.

After that I have also tried with python version 3.4.5 and getting the similar error. Kindly look at it:

screenshot from 2017-02-01 12 12 04 screenshot from 2017-02-01 16 48 27

irmen commented 7 years ago

@vruusmann just to be fair, the fact that this protocol level 0 pickle file was passed through Pyrolite did expose the bug about the unrecognised escape character. For that I'm happy because this is now fixed in the next release of Pyrolite