jpmml / sklearn2pmml

Python library for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
686 stars 113 forks source link

`SEVERE: Failed to convert` in pmml output for OneVsRestClassifier #148

Closed ghost closed 4 years ago

ghost commented 5 years ago

Not sure if it's a non-fixable issue again.


====================================
             script
====================================
from sklearn import datasets
from sklearn.svm import LinearSVC
from sklearn.multiclass import OneVsRestClassifier
from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline
import numpy

iris = datasets.load_iris()
X = iris.data
Y = iris.target
model = OneVsRestClassifier(LinearSVC(random_state=0))
pipeline = PMMLPipeline([
    ('OneVsRestClassifier', model)
])
pipeline.active_fields = numpy.array(iris.feature_names)
pipeline.target_fields = numpy.array('Species')
pipeline.fit(X, Y)
sklearn2pmml(pipeline, 'OneVsRestClassifier.pmml')

====================================
             error
====================================
Standard error:
INFO: Parsing PKL..
INFO: Parsed PKL in 23 ms.
INFO: Converting..
SEVERE: Failed to convert
java.lang.IllegalArgumentException
    at sklearn.multiclass.OneVsRestClassifier.encodeModel(OneVsRestClassifier.java:83)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:213)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)
Exception in thread "main" java.lang.IllegalArgumentException
    at sklearn.multiclass.OneVsRestClassifier.encodeModel(OneVsRestClassifier.java:83)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:213)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)
Traceback (most recent call last):
  File "OneVsRestClassifier.py", line 18, in
    sklearn2pmml(pipeline, 'OneVsRestClassifier.pmml')
  File "..\Python37-32\lib\site-packages\sklearn2pmml\__init__.py", line 252, in sklearn2pmml
    raise RuntimeError("The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams")
RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams

====================================
             version
====================================
Java Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Python 3.7.2

Package                       Version
----------------------------- --------
alabaster                     0.7.12
Babel                         2.6.0
beautifulsoup4                4.7.1
certifi                       2019.3.9
chardet                       3.0.4
colorama                      0.4.1
cycler                        0.10.0
dicttoxml                     1.7.4
docutils                      0.14
idna                          2.8
imagesize                     1.1.0
Jinja2                        2.10
joblib                        0.13.2
kiwisolver                    1.0.1
lxml                          4.3.3
MarkupSafe                    1.1.1
matplotlib                    3.0.3
mpmath                        1.1.0
numpy                         1.16.2
numpydoc                      0.8.0
nyoka                         3.0.7
packaging                     19.0
pandas                        0.24.2
pandas-flavor                 0.1.2
pathlib                       1.0.1
patsy                         0.5.1
periodictable                 1.5.0
pip                           19.0.3
Pygments                      2.3.1
pyparsing                     2.3.1
pyrolite                      0.1.10
python-dateutil               2.8.0
python-ternary                1.0.5
pytz                          2018.9
requests                      2.21.0
scikit-learn                  0.20.3
scipy                         1.2.1
setuptools                    40.9.0
six                           1.12.0
sklearn                       0.0
sklearn-pandas                1.8.0
sklearn2pmml                  0.44.0
snowballstemmer               1.2.1
soupsieve                     1.9
Sphinx                        2.0.0
sphinxcontrib-applehelp       1.0.1
sphinxcontrib-devhelp         1.0.1
sphinxcontrib-htmlhelp        1.0.1
sphinxcontrib-jsmath          1.0.1
sphinxcontrib-qthelp          1.0.2
sphinxcontrib-serializinghtml 1.1.1
statsmodels                   0.9.0
sympy                         1.3
urllib3                       1.24.1
xlrd                          1.2.0
xmljson                       0.2.0
vruusmann commented 5 years ago

Looks like you've got Joblib dump working now - what was the issue about?

Anyway, this particular exception is thrown, because you're training OneVsRestClassifier with a non-probabilistic elementary classifier (LinearSVC). If you switch to a probabilistic classifier (eg. DecisionTreeClassifier, LogisticRegression), then everything will work fine.

The fix here is to provide a more informative exception message.

ghost commented 5 years ago

Hi @vruusmann OneVsRestClassifier with LinearSVC is the simplest example provided in scikit-learn. https://scikit-learn.org/stable/modules/multiclass.html#one-vs-the-rest Please check 1.12.2.1. Multiclass learning.

Btw, DecisionTreeClassifier is not working on my end due to #146. If I can't successfully complete the simplest exercise like DecisionTreeClassifier, training a decision tree classifier for the iris dataset. Then there's no point in trying OneVsRestClassifier with DecisionTreeClassifier.

vruusmann commented 5 years ago

OneVsRestClassifier with LinearSVC is the simplest example provided in scikit-learn.

Unfortunately, one cannot support everything. Aggregating over probabilistic classifiers is more relevant and interesting for real-life applications than aggregating over non-probabilistic ones.

Btw, DecisionTreeClassifier is not working on my end due to #146.

Interesting note. It suggests that perhaps the pickling error is related to CPython classes - DecisionTreeClassifier contains CPython tree objects, whereas the OneVsRestClassifier + LinearSVC doesn't contain any.

Anyway, given that SkLearn2PMML is not working for you, please consider switching to Nyoka.

ghost commented 5 years ago

@vruusmann Thank you for the help.

ghost commented 5 years ago

Hi @vruusmann ,

Thanks for your reminder of trying OneVsRestClassifier(DecisionTreeClassifier()) instead of OneVsRestClassifier(LinearSVC(random_state=0)). In #146, my anaconda environment can do pmml export from DecisionTreeClassifier, so the OneVsRestClassifier pmml could be exported correctly.

Thank you.