jpmml / sklearn2pmml

Python library for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
685 stars 113 forks source link

Got "The JPMML-SkLearn conversion application has failed" error #75

Closed wally-yu closed 6 years ago

wally-yu commented 6 years ago

Dear Villu,

I just tried to play with this package, I want to train my data set from python and make predictions from Android but got some failure message. Here are my codes:

from sklearn.datasets import load_breast_cancer

# Load dataset
data = load_breast_cancer()

# Organize our data
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']

from sklearn.model_selection import train_test_split

# Split our data
train, test, train_labels, test_labels = train_test_split(features, labels, test_size=0.33, random_state=42)

from sklearn.naive_bayes import GaussianNB
from sklearn2pmml import PMMLPipeline 

nb_pipeline = PMMLPipeline([
  ('classifier', GaussianNB())
])
#
# Train our classifier
nb_pipeline.fit(train, train_labels)
#
from sklearn2pmml import sklearn2pmml
sklearn2pmml(nb_pipeline, 'nb.pmml', with_repr = True,debug=True)

Here are my environment:

('python: ', '2.7.13')
('sklearn: ', '0.18')
('sklearn.externals.joblib:', '0.10.2')
('pandas: ', u'0.17.1')
('sklearn_pandas: ', '1.6.0')
('sklearn2pmml: ', '0.29.0')

The error message I got is:

The JPMML-SkLearn conversion application has failed. The Java process should have printed more information about the failure into its standard output and/or error streams

I would like to take a look at my "pipeline-ritorz.pkl.z" file, but somehow it failed to be unzipped.

Could you please help? Thanks in advance.

vruusmann commented 6 years ago

Your script executes absolutely fine in my computer:

INFO: Parsing PKL..
jaan 09, 2018 6:14:51 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 23 ms.
jaan 09, 2018 6:14:51 PM org.jpmml.sklearn.Main run
INFO: Converting..
jaan 09, 2018 6:14:51 PM sklearn2pmml.PMMLPipeline encodePMML
WARNING: Attribute 'sklearn2pmml.PMMLPipeline.target_fields' is not set. Assuming y as the name of the target field
jaan 09, 2018 6:14:51 PM sklearn2pmml.PMMLPipeline initFeatures
WARNING: Attribute 'sklearn2pmml.PMMLPipeline.active_fields' is not set. Assuming [x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, x29, x30] as the names of active fields
jaan 09, 2018 6:14:51 PM org.jpmml.sklearn.Main run
INFO: Converted in 86 ms.
jaan 09, 2018 6:14:51 PM org.jpmml.sklearn.Main run
INFO: Marshalling PMML..
jaan 09, 2018 6:14:52 PM org.jpmml.sklearn.Main run
INFO: Marshalled PMML in 937 ms.
Preserved joblib dump file(s):  /tmp/pipeline-vpweh_o1.pkl.z

My environment:

python:  3.4.3
sklearn:  0.19.1
sklearn.externals.joblib: 0.11
pandas:  0.21.0
sklearn_pandas:  1.5.0
sklearn2pmml:  0.29.0
vruusmann commented 6 years ago

Updated my sklearn_pandas dependency from 1.5.0 to 1.6.0, and everything continues to be fine.

vruusmann commented 6 years ago

Do you have Java 1.7 or newer installed? Python would throw this kind of exception also when the Java executable (eg. java.exe) is not found on system path.

wally-yu commented 6 years ago

it's weird, I have latest java installed. Thank you Villu for your timely reply. I will try using the same package versions as yours today :)

vruusmann commented 6 years ago

Also works in my Python 2.7 environment (a bit outdated, but much closer to your environment):

('python: ', '2.7.11')
('sklearn: ', '0.18')
('sklearn.externals.joblib:', '0.10.2')
('pandas: ', u'0.18.1')
('sklearn_pandas: ', '1.5.0')
('sklearn2pmml: ', '0.29.0')
sjjpo2002 commented 6 years ago

I get the application failed message too. Here is my debug trace.

python: 3.6.4
sklearn: 0.19.1
sklearn.externals.joblib: 0.11
pandas: 0.22.0
sklearn_pandas: 1.6.0
sklearn2pmml: 0.29.0
java: N/A
Executing command:
java -cp C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\guava-20.0.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\istack-commons-runtime-3.0.5.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\jaxb-core-2.3.0.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\jaxb-runtime-2.3.0.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\jcommander-1.48.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\jpmml-converter-1.2.6.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\jpmml-lightgbm-1.1.3.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\jpmml-sklearn-1.4.5.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\jpmml-xgboost-1.2.4.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\pmml-agent-1.3.8.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\pmml-model-1.3.8.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\pmml-model-metro-1.3.8.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\pmml-schema-1.3.8.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\pyrolite-4.19.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\serpent-1.18.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\slf4j-api-1.7.25.jar;C:\Users\cpourms\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\resources\slf4j-jdk14-1.7.25.jar org.jpmml.sklearn.Main --pkl-pipeline-input C:\Users\cpourms\AppData\Local\Temp\pipeline-jw9kdemb.pkl.z --pmml-output text_clf_red.pmml
Standard output is empty
Standard error:
Feb 13, 2018 3:59:32 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Feb 13, 2018 4:02:05 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 153182 ms.
Feb 13, 2018 4:02:05 PM org.jpmml.sklearn.Main run
INFO: Converting..
Feb 13, 2018 4:02:05 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Business and Economics - Labor Issues
    at sklearn.ClassifierUtil$1.apply(ClassifierUtil.java:50)
    at sklearn.ClassifierUtil$1.apply(ClassifierUtil.java:43)
    at com.google.common.collect.Lists$TransformingRandomAccessList$1.transform(Lists.java:638)
    at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
    at org.jpmml.converter.PMMLUtil.addValues(PMMLUtil.java:117)
    at org.jpmml.converter.PMMLUtil.addValues(PMMLUtil.java:106)
    at org.jpmml.converter.PMMLEncoder.createDataField(PMMLEncoder.java:112)
    at sklearn2pmml.PMMLPipeline.encodePMML(PMMLPipeline.java:121)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)

Exception in thread "main" java.lang.IllegalArgumentException: Business and Economics - Labor Issues
    at sklearn.ClassifierUtil$1.apply(ClassifierUtil.java:50)
    at sklearn.ClassifierUtil$1.apply(ClassifierUtil.java:43)
    at com.google.common.collect.Lists$TransformingRandomAccessList$1.transform(Lists.java:638)
    at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
    at org.jpmml.converter.PMMLUtil.addValues(PMMLUtil.java:117)
    at org.jpmml.converter.PMMLUtil.addValues(PMMLUtil.java:106)
    at org.jpmml.converter.PMMLEncoder.createDataField(PMMLEncoder.java:112)
    at sklearn2pmml.PMMLPipeline.encodePMML(PMMLPipeline.java:121)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)

Preserved joblib dump file(s): C:\Users\cpourms\AppData\Local\Temp\pipeline-jw9kdemb.pkl.z
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-2c49329a617c> in <module>()
----> 1 sklearn2pmml(text_clf_red, 'text_clf_red'+'.pmml', with_repr = True, debug=True)

~\AppData\Roaming\Python\Python36\site-packages\sklearn2pmml\__init__.py in sklearn2pmml(pipeline, pmml, user_classpath, with_repr, debug)
    301                                 print("Standard error is empty")
    302                 if retcode:
--> 303                         raise RuntimeError("The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams")
    304         finally:
    305                 if debug:

RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams

I have java installed:

java -version
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
vruusmann commented 6 years ago

@sjjpo2002 The name of your target column - Business and Economics - Labor Issues - fails the following sanity check: https://github.com/jpmml/jpmml-sklearn/blob/master/src/main/java/sklearn/ClassifierUtil.java#L49-L51

Sure, the requirement that the target column name must not contain whitespace characters is totally arbitrary (and could/should be removed).

In the meantime, you can make the conversion work if you rename the target column. The simplest thing would be to override the PMMLPipeline.target_fields attribute:

pipeline = PMMLPipeline(...)
pipeline.fit(X, y)
pipeline.target_fields = numpy.array(["Business_and_Economics_-_Labor_Issues"])

sklearn2pmml(pipeline, "pipeline.pmml")
vruusmann commented 6 years ago

@sjjpo2002 Sorry, it's one of target category values, not the target column name, which fails the "must not contain whitespace characters" check.

The above code snippet doesn't help in that case. You would need to rename the offending target category value to proceed.

puifais commented 5 years ago

Hi @vruusmann ,

I have the same error trace but my Java is version 1.8.0_181-b13, so it's pretty new. And I do not have white space in my target field name or values (only 0 or 1). Any suggestions on what I should look at to debug the issue? I can write PMML file just fine with numerical fields, but not categorical field.

Thank you, Puifai

vruusmann commented 5 years ago

@puifais If your conversion is working with some model configurations and not with others, then it's definitely not a Java version problem. There's something wrong with your model object, and the explanation is contained in the exception message and/or exception stack trace. Most conversion errors (eg. IllegalArgumentException) are raised because of failed sanity checks - stopping you from generating unusual PMML documents.

vruusmann commented 5 years ago

@puifais Extract your Java exception stack trace (the line starting with Exception in thread "main", and the next five to seven lines after than), and open a new issue based on this evidence.

Huiyu-Luo commented 4 years ago

hello, @vruusmann ,when I use model to predict multiple values, the same problem arises. However, when only one value is predicted, the problem disappears. RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams

vruusmann commented 4 years ago

The Java executable should have printed more information about the failure into its standard output and/or standard error streams

@Huiyu-Luo You should take this exception message quite literally. The root cause of your perceived issue has been printed into Java's standard output or error stream.

If you're working inb Notebook environment, then this gets redirected to Notebook's backend log file.