Closed kb3wmh closed 6 years ago
Is this a complete code example?
You have EstimatorProxy(mlp)
in your script, but I can't find the definition of mlp
anywhere. The title of this issue suggests that it should be of type MLPRegressor
, but the exception message suggests PMMLPipeline
instead.
You should be able to make the conversion work by simplifying the PMML pipeline:
EstimatorProxy
. It's not needed here, because class MLPRegressor
doesn't contain any non-persisent attributes.This should work without problems:
pipeline = PMMLPipeline([
("scaler", StandardScaler()),
("regressor", MLPRegressor())
])
pipeline.fit(X_train, y_train)
This still doesn't work:
from sklearn.neural_network import MLPRegressor
from sklearn2pmml import EstimatorProxy
from sklearn2pmml import PMMLPipeline, sklearn2pmml
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.externals import joblib
if __name__ == "__main__":
mlp_regressor = MLPRegressor() #I've tried this with and without this line
scaler = StandardScaler() #Also this one
mlp_regressor = joblib.load("mlp.pkl") # MLP model previously exported to pkl file
scaler = joblib.load("scaler.pkl") # StandardScaler()
pipeline = PMMLPipeline([
("scaler", scaler),
("regressor", mlp_regressor)
])
sklearn2pmml(pipeline, "pipeline_test.pmml", debug=True)
I have saved the scaler and MLP regressor to pickle files so that I don't have to retrain the model and can more easily apply new data. This works, I can load the model back in, and fit to the pipeline, and get the same results. But I keep getting the java errors when I try to convert these models to a PMML.
It works if I don't use pickle files--which I can work around, if need be. But if you have an idea of what is going on, I'd be very grateful.
I'm very much a noob, so thank you so much for replying to me.
But I keep getting the java errors when I try to convert these models to a PMML.
What are those exceptions? They must be something else than the one shown above.
Training a Scikit-Learn pipeline, and savings its components to Pickle files:
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y = True)
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
scaler = StandardScaler()
classifier = MLPClassifier()
pipeline = Pipeline([
("scaler", scaler),
("classifier", classifier)
])
pipeline.fit(X, y)
from sklearn.externals import joblib
joblib.dump(scaler, "scaler.pkl")
joblib.dump(classifier, "classifier.pkl")
Loading components from Pickle files, and converting to PMML data format:
from sklearn.externals import joblib
scaler2 = joblib.load("scaler.pkl")
classifier2 = joblib.load("classifier.pkl")
from sklearn2pmml import PMMLPipeline
import numpy
pmml_pipeline = PMMLPipeline([
("scaler2", scaler2),
("classifier2", classifier2)
])
pmml_pipeline.active_fields = numpy.asarray(["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"])
pmml_pipeline.target_fields = numpy.asarray(["Species"])
from sklearn2pmml import sklearn2pmml
sklearn2pmml(pmml_pipeline, "iris.pmml", with_repr = True)
Hi @vruusmann ,
I realize that I get an error if I do it like below:
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y = True)
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
scaler = StandardScaler()
pipeline = Pipeline([
("scaler", scaler)
])
pipeline.fit(X, y)
from sklearn2pmml import PMMLPipeline
pmml_pipeline = PMMLPipeline([
("scaler", scaler)
])
from sklearn2pmml import sklearn2pmml
sklearn2pmml(pmml_pipeline, "iris.pmml", with_repr = True)
Do you know why? Do I always have to include the MLPClassifier
to the PMMLPipeline
?
What is the correct way to do it if I only need the StandardScaler
?
@hardianlawi What kind of error are you getting? I believe it's the same as here: https://github.com/jpmml/sklearn2pmml/issues/78
Do I always have to include the MLPClassifier to the PMMLPipeline?
A pipeline is defined as a sequence of transformers, followed by an estimator. If the pipeline does not contain the final estimator step, then it is under-specified.
What is the correct way to do it if I only need the StandardScaler?
Terminate your pipeline with a dummy estimator class such as sklearn.dummy.DummyClassifier
or sklearn.dummy.DummyRegressor
.
@vruusmann Thanks for your reply.
I get the errors below when trying to run the code:
Standard output is empty
Standard error:
Mar 06, 2018 5:57:28 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Mar 06, 2018 5:57:28 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 34 ms.
Mar 06, 2018 5:57:28 PM org.jpmml.sklearn.Main run
INFO: Converting..
Mar 06, 2018 5:57:28 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Tuple contains an unsupported value (Python class sklearn.preprocessing.data.StandardScaler)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
at org.jpmml.sklearn.TupleUtil.extractElement(TupleUtil.java:48)
at sklearn2pmml.PMMLPipeline.getEstimator(PMMLPipeline.java:369)
at sklearn2pmml.PMMLPipeline.encodePMML(PMMLPipeline.java:85)
at org.jpmml.sklearn.Main.run(Main.java:145)
at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast sklearn.preprocessing.StandardScaler to sklearn.Estimator
at java.lang.Class.cast(Class.java:3369)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)
... 5 more
Exception in thread "main" java.lang.IllegalArgumentException: Tuple contains an unsupported value (Python class sklearn.preprocessing.data.StandardScaler)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
at org.jpmml.sklearn.TupleUtil.extractElement(TupleUtil.java:48)
at sklearn2pmml.PMMLPipeline.getEstimator(PMMLPipeline.java:369)
at sklearn2pmml.PMMLPipeline.encodePMML(PMMLPipeline.java:85)
at org.jpmml.sklearn.Main.run(Main.java:145)
at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast sklearn.preprocessing.StandardScaler to sklearn.Estimator
at java.lang.Class.cast(Class.java:3369)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)
... 5 more
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-1-34e2323412a5> in <module>()
21 from sklearn2pmml import sklearn2pmml
22
---> 23 sklearn2pmml(pmml_pipeline, "iris.pmml", with_repr = True)
/usr/local/lib/python3.5/dist-packages/sklearn2pmml/__init__.py in sklearn2pmml(pipeline, pmml, user_classpath, with_repr, debug)
304 print("Standard error is empty")
305 if retcode:
--> 306 raise RuntimeError("The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams")
307 finally:
308 if debug:
RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams
Terminate your pipeline with a dummy estimator class such as
sklearn.dummy.DummyClassifier
orsklearn.dummy.DummyRegressor
Won't this be adding additional inference step when I use the pipeline in Java? I have trained my model in tensorflow
using Python
and I am only using StandardScaler
to preprocess my data before making any inference using the tensorflow
model. Let me know if I miss something here!
java.lang.IllegalArgumentException: Tuple contains an unsupported value (Python class sklearn.preprocessing.data.StandardScaler)
This exception doesn't make sense - Python class sklearn.preprocessing.data.StandardScaler
is always registered with the SkLearn2PMML/JPMML-SkLearn runtime.
Maybe your SkLearn2PMML installation is corrupt or something.
Won't this be adding additional inference step when I use the pipeline in Java?
They are dummy estimators, so they don't take much resources to fit.
Yo dude,
This exception doesn't make sense - Python class
sklearn.preprocessing.data.StandardScaler
is always registered with the SkLearn2PMML/JPMML-SkLearn runtime.
Could you try running that on your machine? Because I tried it both on my remote and local machine. Both of them output the same exception.
They are dummy estimators, so they don't take much resources to fit.
What I mean is by adding the dummy estimator, I believe when I load the saved model iris.pmml
to Java, I won't be able to only use the StandardScaler()
part. I imagine sth like below:
pipeline = load('iris.pmml');
pipeline.transform(x) -> an inference. What I am interested in is the preprocessing step.
@hardianlawi Please work on your attitude - I don't owe you anything.
Hi @vruusmann,
I apologize if I sounded rude to you. I didn't mean it that way but thank you for your help anyway! I really appreciate it. Great work for doing everything alone.
My Python code using sklearn2pmml
The Java Error: