jpmml / sklearn2pmml

Python library for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
685 stars 113 forks source link

sklearn2pmml: 0.17.4 - InvalidOpcodeException for IsolationForest #31

Closed dverstee closed 7 years ago

dverstee commented 7 years ago

Hello,

First of all thank you for creating and maintaining this great library. Secondly I seem to be running into some issues when trying to create a PMML from following pipeline

    IForest_pipeline = PMMLPipeline([("isolationforest", IsolationForest(n_estimators=100,random_state=0))])

The respective errors are : SEVERE: Failed to parse PKL net.razorvine.pickle.InvalidOpcodeException: invalid pickle opcode: 248

Versions are

sklearn:  0.18.1
sklearn.externals.joblib: 0.10.3
pandas:  0.19.2
sklearn_pandas:  1.3.0
sklearn2pmml:  0.17.4
java:1.8.0
joblib==0.11
python==3.6.0

The weird thing is that this code (using kmeans) generates the PMML correctly, so it might be something model specific

IForest_pipeline = PMMLPipeline([("classifier", KMeans(n_clusters=2, random_state=0))])

Run on windows 7 SP1 64bit

Could you give me pointers on where I might be looking to solve this error ?
Thanks in advance.

vruusmann commented 7 years ago

SEVERE: Failed to parse PKL net.razorvine.pickle.InvalidOpcodeException: invalid pickle opcode: 248

JPMML-SkLearn depends on the Pyrolite library for low-level PKL file parsing functionality. Apparently, Pyrolite does not recognize Pickle protocol opcode 248 (hex 0xf8).

I can't find the definition of this opcode in Python 3.6 codebase: https://github.com/python/cpython/blob/3.6/Lib/pickle.py#L102-L179

Could you give me pointers on where I might be looking to solve this error?

The stack trace of this exception points at method net.razorvine.pickle.Unpikcler#dispatch(short). You could try addinng a case statement for opcode 248 there, and collect more information about it (eg. which Scikit-Learn class, which class attribute, etc.). Then, try to factor a minimal reproducible example, and open a parallel issue with the Pyrolite project: https://github.com/irmen/Pyrolite/issues

The JPMML-SkLearn project includes an integration test for the IsolationForest model type: https://github.com/jpmml/jpmml-sklearn/blob/master/src/test/resources/pkl/IsolationForestHousingAnomaly.pkl

This PKL file was generated using Python 3.4(.3). So, downgrading from Python 3.6 to 3.4 may provide a temporary workaround.

dverstee commented 7 years ago

Thanks for the quick update.

I ran the code with python 3.4 and I have it confirmed working ! You are great man ! :+1:

When running your debug pointer I noticed that all tree models are affected, but others not. For now I'll use python 3.4 as a workaround and I'll close the Issue.

Thanks again.