jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

Support for PyCaret transformers #175

Closed szymoonl closed 1 year ago

szymoonl commented 2 years ago

According to this comment, I tried to convert Pycaret model as follow:

prep_pipe = get_config('prep_pipe')
dt = create_model('dt')
final_dt = finalize_model(dt)

from sklearn2pmml.pipeline import PMMLPipeline

pmml_pipeline = PMMLPipeline([
    ("prep_pipe", prep_pipe),
    ("final_model", final_dt )
])

import pickle

with open("test_pmml.pkl", "wb") as pf:
    pickle.dump(pmml_pipeline, pf)

sklearn: 0.23.2 sklearn pandas: 2.2.0 sklearn2pmml: 0.84.1 pycaret: 2.3.6 openjdk version "11.0.15" 2022-04-19

The following exception occurred during conversion using a .jar:

java -jar /jpmml-sklearn/pmml-sklearn-example/target/pmml-sklearn-example-executable-1.7-SNAPSHOT.jar --pkl-input /test_pmml.pkl --pmml-output model_pmml.pmml Jun 21, 2022 8:06:26 AM org.jpmml.sklearn.example.Main run INFO: Parsing PKL.. Jun 21, 2022 8:06:26 AM org.jpmml.sklearn.example.Main run INFO: Parsed PKL in 57 ms. Jun 21, 2022 8:06:26 AM org.jpmml.sklearn.example.Main run INFO: Converting PKL to PMML.. Jun 21, 2022 8:06:26 AM sklearn2pmml.pipeline.PMMLPipeline initTargetFields WARNING: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.target_fields' is not set. Assuming y as the name of the target field Jun 21, 2022 8:06:26 AM org.jpmml.sklearn.example.Main run SEVERE: Failed to convert PKL to PMML java.lang.IllegalArgumentException: The transformer object (Python class pycaret.internal.preprocess.DataTypes_Auto_infer) is not a supported Transformer at org.jpmml.python.CastFunction.apply(CastFunction.java:47) at sklearn.pipeline.Pipeline$1.apply(Pipeline.java:108) at sklearn.pipeline.Pipeline$1.apply(Pipeline.java:95) at com.google.common.collect.Lists$TransformingRandomAccessList.get(Lists.java:638) at sklearn2pmml.pipeline.PMMLPipeline.getHead(PMMLPipeline.java:629) at sklearn2pmml.pipeline.PMMLPipeline.getHead(PMMLPipeline.java:642) at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:198) at org.jpmml.sklearn.example.Main.run(Main.java:226) at org.jpmml.sklearn.example.Main.main(Main.java:151) Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer at java.base/java.lang.Class.cast(Class.java:3605) at org.jpmml.python.CastFunction.apply(CastFunction.java:45) ... 8 more

Exception in thread "main" java.lang.IllegalArgumentException: The transformer object (Python class pycaret.internal.preprocess.DataTypes_Auto_infer) is not a supported Transformer at org.jpmml.python.CastFunction.apply(CastFunction.java:47) at sklearn.pipeline.Pipeline$1.apply(Pipeline.java:108) at sklearn.pipeline.Pipeline$1.apply(Pipeline.java:95) at com.google.common.collect.Lists$TransformingRandomAccessList.get(Lists.java:638) at sklearn2pmml.pipeline.PMMLPipeline.getHead(PMMLPipeline.java:629) at sklearn2pmml.pipeline.PMMLPipeline.getHead(PMMLPipeline.java:642) at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:198) at org.jpmml.sklearn.example.Main.run(Main.java:226) at org.jpmml.sklearn.example.Main.main(Main.java:151) Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer at java.base/java.lang.Class.cast(Class.java:3605) at org.jpmml.python.CastFunction.apply(CastFunction.java:45) ... 8 more

How to solve this? 🤔 Thank you in advance!

vruusmann commented 2 years ago

According to this comment, I tried to convert Pycaret model as follow:

Great to see that this old workaround is still valid!

However, I wonder if PyCaret has "systematized" their workflows, so that they could be programmatically converted to standard Scikit-Learn pipeline objects.

Exception in thread "main" java.lang.IllegalArgumentException: The transformer object (Python class pycaret.internal.preprocess.DataTypes_Auto_infer) is not a supported Transformer

Just as the exception message points out - there is a custom PyCaret transformer class pycaret.internal.preprocess.DataTypes_Auto_infer in your pipeline.

Potential solutions:

  1. Write a SkLearn2PMML/JPMML-SkLearn handler for this custom transformer class.
  2. Remove this step from the pipeline. Maybe if you specify column names/types explicitly, then PyCaret will have all the necessary information available, and won't perform any inference work by itself.

For starters, try to convert the model without pre-processing. When you can get this part working, only then start adding complexity (such as pre-processing).

szymoonl commented 2 years ago

When trying to convert the model alone without preprocessing, the following error appears:

Aug 16, 2022 12:48:12 PM org.jpmml.sklearn.example.Main run INFO: Parsing PKL.. Aug 16, 2022 12:48:12 PM org.jpmml.sklearn.example.Main run SEVERE: Failed to parse PKL net.razorvine.pickle.PickleException: failed to setstate() at net.razorvine.pickle.Unpickler.load_build(Unpickler.java:395) at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:220) at org.jpmml.python.CustomUnpickler.dispatch(CustomUnpickler.java:31) at org.jpmml.python.PickleUtil$1.dispatch(PickleUtil.java:64) at net.razorvine.pickle.Unpickler.load(Unpickler.java:109) at org.jpmml.python.PickleUtil.unpickle(PickleUtil.java:85) at org.jpmml.sklearn.example.Main.run(Main.java:163) at org.jpmml.sklearn.example.Main.main(Main.java:151) Caused by: java.lang.NoSuchMethodException: net.razorvine.pickle.objects.ClassDict.setstate(java.lang.Integer) at java.base/java.lang.Class.getMethod(Class.java:2108) at net.razorvine.pickle.Unpickler.load_build(Unpickler.java:392) ... 7 more

Exception in thread "main" net.razorvine.pickle.PickleException: failed to setstate() at net.razorvine.pickle.Unpickler.load_build(Unpickler.java:395) at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:220) at org.jpmml.python.CustomUnpickler.dispatch(CustomUnpickler.java:31) at org.jpmml.python.PickleUtil$1.dispatch(PickleUtil.java:64) at net.razorvine.pickle.Unpickler.load(Unpickler.java:109) at org.jpmml.python.PickleUtil.unpickle(PickleUtil.java:85) at org.jpmml.sklearn.example.Main.run(Main.java:163) at org.jpmml.sklearn.example.Main.main(Main.java:151) Caused by: java.lang.NoSuchMethodException: net.razorvine.pickle.objects.ClassDict.setstate(java.lang.Integer) at java.base/java.lang.Class.getMethod(Class.java:2108) at net.razorvine.pickle.Unpickler.load_build(Unpickler.java:392) ... 7 more

The model is a random forest but trained with GPU, so it is a cuml object:

RandomForestClassifier() <class 'cuml.ensemble.randomforestclassifier.RandomForestClassifier'>

Converting the model alone without GPU as a sklearn object works without problem. 🤔

When converting model with preprocessing pipeline I got below exception:

Aug 16, 2022 12:48:32 PM org.jpmml.sklearn.example.Main run INFO: Parsing PKL.. Aug 16, 2022 12:48:32 PM org.jpmml.sklearn.example.Main run SEVERE: Failed to parse PKL net.razorvine.pickle.PickleException: failed to setstate() at net.razorvine.pickle.Unpickler.load_build(Unpickler.java:395) at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:220) at org.jpmml.python.CustomUnpickler.dispatch(CustomUnpickler.java:31) at org.jpmml.python.PickleUtil$1.dispatch(PickleUtil.java:64) at net.razorvine.pickle.Unpickler.load(Unpickler.java:109) at org.jpmml.python.PickleUtil.unpickle(PickleUtil.java:85) at org.jpmml.sklearn.example.Main.run(Main.java:163) at org.jpmml.sklearn.example.Main.main(Main.java:151) Caused by: java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at net.razorvine.pickle.Unpickler.load_build(Unpickler.java:393) ... 7 more Caused by: net.razorvine.pickle.PickleException: Expected 8 attribute(s), got 9 attribute(s) at org.jpmml.python.CustomPythonObject.createAttributeMap(CustomPythonObject.java:81) at numpy.DType.setstate(DType.java:50) ... 12 more

Exception in thread "main" net.razorvine.pickle.PickleException: failed to setstate() at net.razorvine.pickle.Unpickler.load_build(Unpickler.java:395) at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:220) at org.jpmml.python.CustomUnpickler.dispatch(CustomUnpickler.java:31) at org.jpmml.python.PickleUtil$1.dispatch(PickleUtil.java:64) at net.razorvine.pickle.Unpickler.load(Unpickler.java:109) at org.jpmml.python.PickleUtil.unpickle(PickleUtil.java:85) at org.jpmml.sklearn.example.Main.run(Main.java:163) at org.jpmml.sklearn.example.Main.main(Main.java:151) Caused by: java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at net.razorvine.pickle.Unpickler.load_build(Unpickler.java:393) ... 7 more Caused by: net.razorvine.pickle.PickleException: Expected 8 attribute(s), got 9 attribute(s) at org.jpmml.python.CustomPythonObject.createAttributeMap(CustomPythonObject.java:81) at numpy.DType.setstate(DType.java:50) ... 12 more

vruusmann commented 2 years ago

@szymoonl Please open a new issue for each unsupported ML framework.

Otherwise I'll classify all your messages as "spam", and send to trash.