onnx / sklearn-onnx

Convert scikit-learn models and pipelines to ONNX
Apache License 2.0
557 stars 104 forks source link

Wrong predictions with `AdaBoostClassifier` with 2 classes #1117

Closed FrancescMartiEscofetQC closed 2 months ago

FrancescMartiEscofetQC commented 4 months ago

When converting a trained AdaBoostClassifier with 2 classes the output probabilities don't match the ones computed by the model:

Code

from sklearn.ensemble import AdaBoostClassifier
from skl2onnx.convert import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
import numpy as np
import onnxruntime as rt

m = AdaBoostClassifier()
rng = np.random.default_rng(7)
X = rng.standard_normal((1000, 10))
y = rng.integers(0, 2, 1000)
m.fit(X, y)
onnx_model = convert_sklearn(
    m,
    initial_types=[("X", FloatTensorType([None, 10]))],
    options={"zipmap": False},
)
sess = rt.InferenceSession(
    onnx_model.SerializeToString(), providers=["CPUExecutionProvider"]
)

np.testing.assert_allclose(m.predict_proba(X),sess.run(["probabilities"], {"X": X.astype(np.float32)})[0], atol=1e-5)

This fails:

AssertionError: 
Not equal to tolerance rtol=1e-07, atol=1e-05

Mismatched elements: 28 / 2000 (1.4%)
Max absolute difference: 0.08980399
Max relative difference: 0.25832238
 x: array([[0.500673, 0.499327],
       [0.497522, 0.502478],
       [0.499847, 0.500153],...
 y: array([[0.500673, 0.499326],
       [0.497522, 0.502478],
       [0.499847, 0.500153],...

If the output has more classes the converter works fine.

Versions: skl2onnx: 1.17.0 sklearn: 1.5.1 python: 3.12.4

xadupre commented 4 months ago

By default Adaboost is using a decision tree and it is a non continuous function. It introduces discrepancies when switching double to float. onnxruntime still does not support the last onnx standard for trees. I'll try to minimize this. See https://onnx.ai/sklearn-onnx/auto_tutorial/plot_ebegin_float_double.html for more details.

xadupre commented 2 months ago

I'll close the issue. Feel free to reopen it.