onnx / onnxmltools

ONNXMLTools enables conversion of models to ONNX
https://onnx.ai
Apache License 2.0
1k stars 181 forks source link

Error converting deserialized xgboost Booster #499

Open ide-an opened 3 years ago

ide-an commented 3 years ago

Hello.

I am trying to convert existing xgboost model file (which is created by xgboost.Booster.save_model) to onnx. While doing that I am getting the following error:

AttributeError: 'Booster' object has no attribute 'best_ntree_limit'

my environment

reproduction code:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from xgboost import DMatrix, Booster, train as train_xgb
from onnxconverter_common.data_types import FloatTensorType
from onnxmltools.convert import convert_xgboost

# Train
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y)
dtrain = DMatrix(X_train, label=y_train)
param = {'objective': 'multi:softmax', 'num_class': 3}
bst_original = train_xgb(param, dtrain, 10)
initial_type = [('float_input', FloatTensorType([None, 4]))]

# Converting original Booster is OK.
onx = convert_xgboost(bst_original, initial_types=initial_type)

# Save and load model
bst_original.save_model('model.json')
bst_loaded = Booster()
bst_loaded.load_model('model.json')

# !!! Converting loaded Booster fails !!!
onx_loaded = convert_xgboost(bst_loaded, initial_types=initial_type)

stack trace:

[01:32:25] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.4.0/src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softmax' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
Traceback (most recent call last):
  File "conv.py", line 25, in <module>
    onx_loaded = convert_xgboost(bst_loaded, initial_types=initial_type)
  File "C:\Users\macke\repos\onnx_sample\env\lib\site-packages\onnxmltools\convert\main.py", line 176, in convert_xgboost
    return convert(*args, **kwargs)
  File "C:\Users\macke\repos\onnx_sample\env\lib\site-packages\onnxmltools\convert\xgboost\convert.py", line 39, in convert
    model = WrappedBooster(model)
  File "C:\Users\macke\repos\onnx_sample\env\lib\site-packages\onnxmltools\convert\xgboost\_parse.py", line 85, in __init__
    self.kwargs = _get_attributes(booster)
  File "C:\Users\macke\repos\onnx_sample\env\lib\site-packages\onnxmltools\convert\xgboost\_parse.py", line 31, in _get_attributes
    ntrees = booster.best_ntree_limit
AttributeError: 'Booster' object has no attribute 'best_ntree_limit'

According to https://github.com/dmlc/xgboost/issues/805 loaded Booster doesn't have best_ntree_limit, it may cause this error.

xadupre commented 3 years ago

I identified the bug, the conversion works before the model is dumped in json format. After being restored, the converter cannot find some information it used to find (objective). The same information needs to be found with another way.

salmatfq commented 2 years ago

Facing the exact same issue. xgboost==1.4.2 onnxmltools==1.10.0

When using the xgboost Learning API (instead of the Sickit Learn one), the xgboost save_model() method does not save the best_ntree_limit attribute which the onnx conversion requires. A workaround to unblock oneself could be explicitly setting the best_ntree_limit as best_ntree_limit = model.num_boosted_rounds() if early stopping is not used during training. This does not occur when using the Sickit Learn wrapper which does saves the best_ntree_limit attribute, also does not occurs when directly saving the model via pickle which makes a more complete snapshot (but pickle dumping is not recommended by xgboost due to backward compatibility concerns).

Any updates on a resolution for this, please?