jpmml / jpmml-converter

Java library for authoring PMML
GNU Affero General Public License v3.0
15 stars 4 forks source link

StackOverflowError #4

Closed ghost closed 8 years ago

ghost commented 8 years ago

I am trying to convert a random forest model for pkl to pmml, and I get stack overflow error. I can covert the regression version of the same model without any problem. Attached is the pkl files for regression and random forest and the mapper.

Model 1.zip

Exception in thread "main" java.lang.StackOverflowError at java.lang.StrictMath.floorOrCeil(StrictMath.java:355) at java.lang.StrictMath.floor(StrictMath.java:340) at java.lang.Math.floor(Math.java:424) at sun.misc.FloatingDecimal.dtoa(FloatingDecimal.java:629) at sun.misc.FloatingDecimal.(FloatingDecimal.java:468) at java.lang.Double.toString(Double.java:196) at org.jpmml.converter.PMMLUtil.formatValue(PMMLUtil.java:387) at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:82) at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:97)

vruusmann commented 8 years ago

Very nice - I can reproduce the StackOverflowError using your example files. Will investigate and fix it in the upcoming JPMML-SkLearn version that will be released either later today or tomorrow.

I suspect that Scikit-Learn has changed something about the encoding of random forest models. I've tested with Scikit-Learn versions 0.16.0 through 0.17.1. What's your Scikit-Learn version?

import sklearn
print(sklearn.__version__)
ghost commented 8 years ago

Thank you very much. The version is 0.17.

vruusmann commented 8 years ago

This looks like a legitimate StackOverflowError, because the first member tree model in your random forest model is over 2000 levels deep. That's highly unusual.

How was your sklearn.ensemble.RandomForestRegressor instance parametrized? You should set the value of max_depth parameter to some sensible value such as 100.

vruusmann commented 8 years ago

There's a related issue, where a StackOverflowError happens when converting a random forest model that has been trained using the Iris dataset. It should be impossible to train a 2000-level deep tree model using a dataset that contains only 150 training instances.

https://github.com/jpmml/sklearn2pmml/issues/4

ghost commented 8 years ago

Thank you very much for your prompt response. I have set the max_depth to 100 and still getting the error. My java version is 1.7.0_79.

ghost commented 8 years ago

I have also tested it with Oracle Java 1.8.0_40.

ghost commented 8 years ago

The error however has changed to:

Exception in thread "main" java.lang.StackOverflowError at sun.misc.FDBigInteger.leftShift(FDBigInteger.java:511) at sun.misc.FDBigInteger.valueOfMulPow52(FDBigInteger.java:324) at sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa(FloatingDecimal.java:714) at sun.misc.FloatingDecimal$BinaryToASCIIBuffer.access$100(FloatingDecimal.java:259) at sun.misc.FloatingDecimal.getBinaryToASCIIConverter(FloatingDecimal.java:1785) at sun.misc.FloatingDecimal.getBinaryToASCIIConverter(FloatingDecimal.java:1738) at sun.misc.FloatingDecimal.toJavaFormatString(FloatingDecimal.java:70) at java.lang.Double.toString(Double.java:204) at org.jpmml.converter.ValueUtil.formatValue(ValueUtil.java:118) at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:81) at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96) at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96) at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96) at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96) at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)

which is the same as https://github.com/jpmml/sklearn2pmml/issues/4

Which java version should I use?

vruusmann commented 8 years ago

You probably can't solve the issue simply by using a different Java version.

The problem is more fundamental, and appears to be an unpickling error (which is manifested on some Java versions, and not on others) or something like that. As a result, we have a situation where the unpickled Scikit-Learn data contains (invalid-) cross-references, which make the TreeModelUtil#encodeNode jump back and forth between two nodes, until the JVM dies with a StackOverflowError.

vruusmann commented 8 years ago

How were the example pickle files in the Model1.zip file generated? I am unable to unpickle them for closer inspection using either sklearn.externals.joblib or pickle modules:

>>> from sklearn.externals import joblib
>>> forest = joblib.load("pp_model_1_forest.pkl")

Traceback (most recent call last):
  File "load_joblib.py", line 3, in <module>
    forest = joblib.load("pp_model_1_forest.pkl")
  File "/usr/lib/python3.4/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 459, in load
    obj = unpickler.load()
  File "/usr/lib64/python3.4/pickle.py", line 1038, in load
    dispatch[key[0]](self)
  File "/usr/lib64/python3.4/pickle.py", line 1384, in load_reduce
    value = func(*args)
  File "sklearn/tree/_tree.pyx", line 579, in sklearn.tree._tree.Tree.__cinit__ (sklearn/tree/_tree.c:6774)
ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'int'

and

>>> import pickle
>>> forest = pickle.load(open("pp_model_1_forest.pkl", "rb"))

Traceback (most recent call last):
  File "load_pickle.py", line 3, in <module>
    forest = pickle.load(open("pp_model_1_forest.pkl", "rb"))
_pickle.UnpicklingError: invalid load key, 'Z'.
ghost commented 8 years ago

test.zip I receive the same error for loading the pickle even for the Iris example provided (see test.zip). I have also put complied jar file. So may be the problem is in the joblib dump of the random forest not in the converter?

def store_pkl(obj, name): joblib.dump(obj,"pkl/" + name, compress = 9)

vruusmann commented 8 years ago

The JPMML-SkLearn library should be able to consume the following dumps:

  1. sklearn.externals.joblib
  2. joblib
  3. pickle

Option 1 is recommended by Scikit-Learn documentation (eg. see http://scikit-learn.org/stable/modules/model_persistence.html). However, it may happen that this module is outdated and/or out of sync with other modules.

You could try dumping the RF object manually using options 2 and 3, and use the JPMML-SkLearn command-line application to do the conversion.

ghost commented 8 years ago

I have tested all methods for dumping the .pkl files. Still stackoverflow error even with Iris data. The log file is provided in the attached file. test.zip

I use Python 2.7 32bit (Anaconda).

This is the code for the model1.zip from sklearn.externals import joblib model 1.zip

def store_pkl(obj, name): joblib.dump(obj,"pkl/" + name, compress = 9)

pp_model_regression = LinearRegression()
pp_model_regression.fit(pp_X, pp_y)

pp_model_forest = RandomForestRegressor(max_depth=100,min_samples_leaf = 5)
pp_model_forest.fit(pp_X, pp_y)

store_pkl(pp_mapper, "pp_mapper_1.pkl")
store_pkl(pp_model_regression, "pp_model_1_regression.pkl")
store_pkl(pp_model_forest, "pp_model_1_forest.pkl")

you should be able to load them with joblib. Can you please try again? I tried different java versions as well. So I am really confused.

vruusmann commented 8 years ago

I use Python 2.7 32bit (Anaconda)

This could be a 32-bit vs. 64-bit compatibility issue.

I'm running a 64-bit OS, and the JPMML-SkLearn project has been tested against 64-bit versions of Python2(.7) and Python3(.4).

My unpickling error message (ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'int') fits perfectly into this picture, as for me SIZE_t is long, not int.

ghost commented 8 years ago

Fixed! Thank you very much for all your help. The problem was the compatibility of python 32 and java 64.

vruusmann commented 8 years ago

Closing this issue in favour of the following one: https://github.com/jpmml/jpmml-sklearn/issues/6