onnx / sklearn-onnx

Convert scikit-learn models and pipelines to ONNX
Apache License 2.0
530 stars 97 forks source link

Blocking Issue to convert VotingClassifier in ONNX format. "Not implemented yet" message. #1071

Open joyceraraujo opened 5 months ago

joyceraraujo commented 5 months ago

I'm trying to convert a model that has been saved in .sav in onnx format. The model is a VotingClassifier (XGBOOST and NaiveBayes). I got the error

Traceback (most recent call last): File "/mnt/c/Users/project/convert_onnx.py", line 29, in <module> onnx_model = convert_sklearn(model,"gbdt_model", File "/mnt/c/Users/project/env/lib/python3.10/site-packages/skl2onnx/convert.py", line 208, in convert_sklearn onnx_model = convert_topology( File "/mnt/c/Users/project/env/lib/python3.10/site-packages/skl2onnx/common/_topology.py", line 1532, in convert_topology topology.convert_operators(container=container, verbose=verbose) File "/mnt/c/Users/project/env/lib/python3.10/site-packages/skl2onnx/common/_topology.py", line 1350, in convert_operators self.call_converter(operator, container, verbose=verbose) File "/mnt/c/Users/project/env/lib/python3.10/site-packages/skl2onnx/common/_topology.py", line 1133, in call_converter conv(self.scopes[0], operator, container) File "/mnt/c/Users/project/env/lib/python3.10/site-packages/skl2onnx/common/_registration.py", line 27, in __call__ return self._fct(*args) File "/mnt/c/Users/project/env/lib/python3.10/site-packages/skl2onnx/operator_converters/voting_classifier.py", line 143, in convert_voting_classifier raise NotImplementedError(NotImplementedError: flatten_transform==True is not implemented yet. You may raise an issue at https://github.com/onnx/sklearn-onnx/issues.
Is there a workaround to this problem ? It's a a blocking issue because by default flatten_transform=True. Thank you. The code is as follow:

from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier
from skl2onnx import convert_sklearn
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx import get_latest_tested_opset_version
from onnxmltools.utils import save_model
import pickle
import joblib
from skl2onnx import convert_sklearn, to_onnx, update_registered_converter
from skl2onnx.common.shape_calculator import (
    calculate_linear_classifier_output_shapes,
    calculate_linear_regressor_output_shapes,
)
from onnxmltools.convert.xgboost.operator_converters.XGBoost import convert_xgboost
from onnxmltools.convert import convert_xgboost as convert_xgboost_booster
from xgboost import XGBClassifier
model = joblib.load("model.sav")
n_features = 12
#n_features = len(model.feature_importances_)
target_opset = get_latest_tested_opset_version()
update_registered_converter(
    XGBClassifier,
    "XGBoostXGBClassifier",
    calculate_linear_classifier_output_shapes,
    convert_xgboost,
    options={"nocl": [True, False], "zipmap": [True, False, "columns"]},
)
onnx_model = convert_sklearn(model,"gbdt_model",
    initial_types=[("input", FloatTensorType([None, n_features]))],
    target_opset={"": target_opset, "ai.onnx.ml": 2})

save_model(onnx_model, 'model_converted.onnx')
attilaimre99 commented 5 months ago

Just turn of flatten_transform in the VotingClassifier with flatten_transform=False.

My example

# random forest classifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# split the data
X_train, X_test, y_train, y_test = train_test_split(X_correct, y_correct, test_size=0.1, random_state=42)

# Initialize the Random Forest model
# model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1, class_weight='balanced')
clf1 = LogisticRegression(multi_class='multinomial', random_state=1)
clf2 = RandomForestClassifier(n_estimators=100, random_state=1, class_weight='balanced')
clf3 = GaussianNB()
model = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3)], voting='hard', flatten_transform=False)
joyceraraujo commented 5 months ago

Here's the revised message with improved clarity and corrected English:

I need to load a VotingClassifier model, which combines XGBoost and NaiveBayes, saved in the ".sav" format and convert it to ONNX format. Since I don't have access to the dataset to retrain the model, I'm directly opening it using joblib and attempting the conversion. The XGBoost version used for training the model was 1.4.2. However, I encountered several problems during the conversion:

  1. I received the following traceback:

Traceback (most recent call last): File "/mnt/c/Users/project/convert_onnx.py", line 29, in <module> onnx_model = convert_sklearn(model, "gbdt_model", File "/mnt/c/Users/project/env/lib/python3.10/site-packages/skl2onnx/convert.py", line 208, in convert_sklearn onnx_model = convert_topology( ... File "/mnt/c/Users/project/env/lib/python3.10/site-packages/skl2onnx/operator_converters/voting_classifier.py", line 143, in convert_voting_classifier raise NotImplementedError("flatten_transform==True is not implemented yet. You may raise an issue at https://github.com/onnx/sklearn-onnx/issues.") NotImplementedError: flatten_transform==True is not implemented yet. You may raise an issue at https://github.com/onnx/sklearn-onnx/issues. This issue was resolved by executing the command: model = model.set_params(flatten_transform=False).

  1. Another issue I faced was:

    File "/mnt/c/Users/dev/lib/python3.10/site-packages/onnxmltools/convert/xgboost/common.py", line 40, in get_xgb_params
    gbp = config["learner"]["gradient_booster"]["gbtree_model_param"]
    KeyError: 'gbtree_model_param'

    This problem was resolved by directly modifying the library and changing the parameter name to match the one compatible with the version of XGBOOST.

  2. After resolving errors 1 and 2, I encountered a new error: raise RuntimeError("Unable to interpret 'FGA', feature names should follow the pattern 'f%d'.") I attempted to change the feature names accordingly. However, the error persisted. Upon inspecting the JSON representation of the model, I noticed that the feature names represented by the 'split' field still used the old names:

{'nodeid': 0, 'depth': 0, 'split': 'ABC', 'split_condition': 6.25, 'yes': 1, 'no': 2, 'missing': 1, 'gain': 78.1462402, 'cover': 259.75, 'children': [{'nodeid': 1, 'depth': 1, 'split': 'BLK', 'split_condition': 0.550000012, 'yes': 3, 'no': 4, 'missing': 3, 'gain': 21.2281971, 'cover': 171.75, 'children': [{'nodeid': 3, 'depth': 2, 'split': 'ABC', 'split_condition': 4.55000019, 'yes': 7, 'no': 8, 'missing': 7, 'gain': 14.1838226, 'cover': 147.5, 'children': [{'nodeid': 7, 'depth': 3, 'split': 'BLK', 'split_condition': 0.25, 'yes': 11, 'no': 12, 'missing': 11, 'gain': 3.78608131, 'cover': 106.5, 'children': [{'nodeid': 11, 'depth': 4, 'split': 'BLK', 'split_condition': 0.150000006, 'yes': 15, 'no': 16, 'missing': 15, 'gain': 5.14517879, 'cover': 78.5, 'children': [{'nodeid': 15, 'depth': 5, 'split': 'MIN', 'split_condition': 8.64999962, 'yes': 17, 'no': 18, 'missing': 17, 'gain': 4.04689026, 'cover': 60, 'children': [{'nodeid': 17, 'leaf': -0.018082479, 'cover': 23.25}, {'nodeid': 18, 'leaf': -0.00116446253, 'cover': 36.75}]}, {'nodeid': 16, 'leaf': -0.0269777905, 'cover': 18.5}]}, {'nodeid': 12, 'leaf': 0.000966416614, 'cover': 28}]}, {'nodeid': 8, 'depth': 3, 'split': 'MIN', 'split_condition': 16.2000008, 'yes': 13, 'no': 14, 'missing': 13, 'gain': 1.36819792, 'cover': 41, 'children': [{'nodeid': 13, 'leaf': 0.00639292412, 'cover': 19}, {'nodeid': 14, 'leaf': 0.0183946956, 'cover': 22}]}]}, {'nodeid': 4, 'leaf': 0.029015895, 'cover': 24.25}]}, {'nodeid': 2, 'depth': 1, 'split': 'MIN', 'split_condition': 23.9500008, 'yes': 5, 'no': 6, 'missing': 5, 'gain': 8.23132324, 'cover': 88, 'children': [{'nodeid': 5, 'leaf': 0.0257856827, 'cover': 34}, {'nodeid': 6, 'depth': 2, 'split': 'BLK', 'split_condition': 0.350000024, 'yes': 9, 'no': 10, 'missing': 9, 'gain': 2.95942688, 'cover': 54, 'children': [{'nodeid': 9, 'leaf': 0.0367497504, 'cover': 23}, {'nodeid': 10, 'leaf': 0.0526438914, 'cover': 31}]}]}]}

How to solve it ?

xadupre commented 5 months ago

I think it is an issue for onnxmltools. Which version are you using?

joyceraraujo commented 5 months ago

The version is onnxmltools 1.12.0.

flacomalone commented 3 months ago

I can confirm that this issue still occurs

Here's the revised message with improved clarity and corrected English:

I need to load a VotingClassifier model, which combines XGBoost and NaiveBayes, saved in the ".sav" format and convert it to ONNX format. Since I don't have access to the dataset to retrain the model, I'm directly opening it using joblib and attempting the conversion. The XGBoost version used for training the model was 1.4.2. However, I encountered several problems during the conversion:

  1. I received the following traceback:

Traceback (most recent call last): File "/mnt/c/Users/project/convert_onnx.py", line 29, in <module> onnx_model = convert_sklearn(model, "gbdt_model", File "/mnt/c/Users/project/env/lib/python3.10/site-packages/skl2onnx/convert.py", line 208, in convert_sklearn onnx_model = convert_topology( ... File "/mnt/c/Users/project/env/lib/python3.10/site-packages/skl2onnx/operator_converters/voting_classifier.py", line 143, in convert_voting_classifier raise NotImplementedError("flatten_transform==True is not implemented yet. You may raise an issue at https://github.com/onnx/sklearn-onnx/issues.") NotImplementedError: flatten_transform==True is not implemented yet. You may raise an issue at https://github.com/onnx/sklearn-onnx/issues. This issue was resolved by executing the command: model = model.set_params(flatten_transform=False).

  1. Another issue I faced was:
File "/mnt/c/Users/dev/lib/python3.10/site-packages/onnxmltools/convert/xgboost/common.py", line 40, in get_xgb_params
    gbp = config["learner"]["gradient_booster"]["gbtree_model_param"]
KeyError: 'gbtree_model_param'

This problem was resolved by directly modifying the library and changing the parameter name to match the one compatible with the version of XGBOOST.

  1. After resolving errors 1 and 2, I encountered a new error: raise RuntimeError("Unable to interpret 'FGA', feature names should follow the pattern 'f%d'.") I attempted to change the feature names accordingly. However, the error persisted. Upon inspecting the JSON representation of the model, I noticed that the feature names represented by the 'split' field still used the old names:

{'nodeid': 0, 'depth': 0, 'split': 'ABC', 'split_condition': 6.25, 'yes': 1, 'no': 2, 'missing': 1, 'gain': 78.1462402, 'cover': 259.75, 'children': [{'nodeid': 1, 'depth': 1, 'split': 'BLK', 'split_condition': 0.550000012, 'yes': 3, 'no': 4, 'missing': 3, 'gain': 21.2281971, 'cover': 171.75, 'children': [{'nodeid': 3, 'depth': 2, 'split': 'ABC', 'split_condition': 4.55000019, 'yes': 7, 'no': 8, 'missing': 7, 'gain': 14.1838226, 'cover': 147.5, 'children': [{'nodeid': 7, 'depth': 3, 'split': 'BLK', 'split_condition': 0.25, 'yes': 11, 'no': 12, 'missing': 11, 'gain': 3.78608131, 'cover': 106.5, 'children': [{'nodeid': 11, 'depth': 4, 'split': 'BLK', 'split_condition': 0.150000006, 'yes': 15, 'no': 16, 'missing': 15, 'gain': 5.14517879, 'cover': 78.5, 'children': [{'nodeid': 15, 'depth': 5, 'split': 'MIN', 'split_condition': 8.64999962, 'yes': 17, 'no': 18, 'missing': 17, 'gain': 4.04689026, 'cover': 60, 'children': [{'nodeid': 17, 'leaf': -0.018082479, 'cover': 23.25}, {'nodeid': 18, 'leaf': -0.00116446253, 'cover': 36.75}]}, {'nodeid': 16, 'leaf': -0.0269777905, 'cover': 18.5}]}, {'nodeid': 12, 'leaf': 0.000966416614, 'cover': 28}]}, {'nodeid': 8, 'depth': 3, 'split': 'MIN', 'split_condition': 16.2000008, 'yes': 13, 'no': 14, 'missing': 13, 'gain': 1.36819792, 'cover': 41, 'children': [{'nodeid': 13, 'leaf': 0.00639292412, 'cover': 19}, {'nodeid': 14, 'leaf': 0.0183946956, 'cover': 22}]}]}, {'nodeid': 4, 'leaf': 0.029015895, 'cover': 24.25}]}, {'nodeid': 2, 'depth': 1, 'split': 'MIN', 'split_condition': 23.9500008, 'yes': 5, 'no': 6, 'missing': 5, 'gain': 8.23132324, 'cover': 88, 'children': [{'nodeid': 5, 'leaf': 0.0257856827, 'cover': 34}, {'nodeid': 6, 'depth': 2, 'split': 'BLK', 'split_condition': 0.350000024, 'yes': 9, 'no': 10, 'missing': 9, 'gain': 2.95942688, 'cover': 54, 'children': [{'nodeid': 9, 'leaf': 0.0367497504, 'cover': 23}, {'nodeid': 10, 'leaf': 0.0526438914, 'cover': 31}]}]}]}

How to solve it ?

I can confirm that the issue showing KeyError: 'gbtree_model_param' still occurs when using onnx==1.15.0 onnxconverter-common==1.14.0 onnxmltools==1.11.2 onnxruntime==1.17.0 xgboost==2.0.3