iamDecode / sklearn-pmml-model

A library to parse and convert PMML models into Scikit-learn estimators.
BSD 2-Clause "Simplified" License
76 stars 15 forks source link

what does the "Exception: PMML model ensemble should use majority vote." mean #19

Closed mrlittleyo closed 4 years ago

mrlittleyo commented 4 years ago

i have tried the PMMLForestClassifier method,but when i run the code "clf = PMMLForestClassifier("./test_Iris.pmml")",i got the error "Exception: PMML model ensemble should use majority vote.",does anyone came accross the same problem

iamDecode commented 4 years ago

Hmm interesting. Scikit-learn's RandomForestClassifier only supports Random Forests by majority vote (i.e., all decision trees cast a vote, the prediction of the random forest is #trees / all trees), so sklearn-pmml-model currently also only supports this type of ensemble. It your model does not use majority vote, then it is essentially not a Random Forest, but an ensemble of decision trees.

Would you mind sharing how you created the PMML model? It would also be helpful if you could share the PMML file itself, or at least the line that looks like:

<Segmentation multipleModelMethod="[something]">
iamDecode commented 4 years ago

My bad, a clearer answer to your question would be: The error means that the PMML model you have tried to load is not a Random Forest. PMMLForestClassifier only supports Random Forests. For other models, please refer to the README at https://github.com/iamDecode/sklearn-pmml-model.

If you share the .pmml file I can help you figure out how to load the model. Let me know if I can help you :)

mrlittleyo commented 4 years ago

My bad, a clearer answer to your question would be: The error means that the PMML model you have tried to load is not a Random Forest. PMMLForestClassifier only supports Random Forests. For other models, please refer to the README at https://github.com/iamDecode/sklearn-pmml-model.

If you share the .pmml file I can help you figure out how to load the model. Let me know if I can help you :) my codes:

use pipline to create pmml model and save

iris_pipeline = PMMLPipeline([("classifier", RandomForestClassifier())]) iris_pipeline.fit(df_train[iris.feature_names], df_train['label']) sklearn2pmml(iris_pipeline, "./test_Iris.pmml")

load pmml model

from sklearn_pmml_model.ensemble import PMMLForestClassifier clf = PMMLForestClassifier("./test_Iris.pmml")

note that the scikit-learn's RandomForestClassifier got the default parameters。

iamDecode commented 4 years ago

Thanks for letting me know! I am working on adding support for Random Forest PMMLs generated with sklearn2pmml.

mrlittleyo commented 4 years ago

Thanks for letting me know! I am working on adding support for Random Forest PMMLs generated with sklearn2pmml.

which type of the pmml model does sklearn_pmml_model support?do you have any demos to show how to create a Random Forest PMMLs which sklearn_pmml_model support

iamDecode commented 4 years ago

@mrlittleyo I just released 0.0.14 that supports Decision Tree and Random Forests PMML models exported with sklearn2pmml. Can you try out the new version and let me know it works for you?

More technical: the problem was that sklearn2pmml uses jpmml, which "flattens" trees by default. This means that a binary split tree is converted into a multi split tree, which makes the final PMML file smaller in size. Multi split trees are however not directly supported by sklearn. I have updated sklearn-pmml-model to convert the multi split tree back into an equivalent binary split that can be imported into scikit-learn.

mrlittleyo commented 4 years ago

@mrlittleyo I just released 0.0.14 that supports Decision Tree and Random Forests PMML models exported with sklearn2pmml. Can you try out the new version and let me know it works for you?

More technical: the problem was that sklearn2pmml uses jpmml, which "flattens" trees by default. This means that a binary split tree is converted into a multi split tree, which makes the final PMML file smaller in size. Multi split trees are however not directly supported by sklearn. I have updated sklearn-pmml-model to convert the multi split tree back into an equivalent binary split that can be imported into scikit-learn.

i have tried Decision Tree and Random Forests PMML models with 0.0.14 version, and i got the same error "ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long'" in sklearn_pmml_model.tree._tree.Tree.cinit()

iamDecode commented 4 years ago

Hmm I am not able to reproduce that error on any of my machines and workspaces. This might be OS specific, are you using Windows by any chance? I don't have any windows machine which makes testing a bit more difficult.

Anyhow, I think this may be caused by using an incompatible version of scikit-learn. Can you try updating to the latest scikit-learn and see if it works then? I use 0.22.2.post1. If that does not work, please share the output of pip freeze.

moyanojv commented 4 years ago

Hello

I have the same problem reported by @mrlittleyo. My OS is Windows 10.

iamDecode commented 4 years ago

I arranged a Windows machine and eventually managed to reproduce the issue. I've released 0.0.14.1 to fix this issue. Could you try again?

Thanks for your patience. This library is still in alpha phase, but feedback like this comes a long way of making it more stable.

moyanojv commented 4 years ago

Hi @iamDecode version 0.0.14.1 works perfectly on windows 10.

Thank you very much.

iamDecode commented 4 years ago

I'm going to close this issue. Seems the problem has been fixed :)