microsoft / hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.
MIT License
3.32k stars 274 forks source link

PyTorch SGDClassifier `predict` result does not match Sklearn model #702

Closed ytwei3 closed 1 year ago

ytwei3 commented 1 year ago

Bug report

When I convert a SGDClassifier model initialised with loss function modified huber, found:

import numpy as np
from sklearn.linear_model import SGDClassifier
import hummingbird.ml

# prepare data
np.random.seed(0)
train_x = np.random.rand(200, 50)
train_y = np.random.randint(10, size=200)
test_x = np.random.rand(100, 50)

# convert
model = SGDClassifier(loss='modified_huber')
model.fit(train_x, train_y)
hb_model = hummingbird.ml.convert(model, 'torch')
# expected result
model.predict(test_x)
array([2, 6, 2, 8, 2, 2, 9, 8, 7, 8, 8, 9, 2, 7, 2, 4, 2, 4, 9, 8, 7, 8,
       2, 9, 2, 9, 6, 8, 8, 0, 2, 9, 9, 2, 8, 9, 4, 8, 0, 7, 9, 5, 7, 9,
       7, 0, 2, 8, 2, 3, 6, 2, 8, 9, 2, 8, 9, 2, 2, 7, 8, 8, 9, 2, 8, 6,
       4, 9, 0, 8, 9, 7, 9, 2, 6, 2, 8, 8, 4, 9, 2, 8, 9, 6, 4, 2, 9, 8,
       7, 2, 7, 9, 8, 3, 9, 1, 8, 2, 2, 9])

# In fact
hb_model.predict(test_x)
array([2, 1, 2, 6, 0, 2, 6, 0, 2, 8, 8, 9, 2, 0, 2, 0, 2, 2, 2, 8, 2, 7,
       2, 9, 2, 9, 0, 8, 0, 0, 2, 7, 9, 2, 8, 2, 2, 8, 0, 1, 1, 5, 7, 9,
       7, 0, 2, 8, 2, 0, 0, 2, 8, 2, 2, 2, 6, 1, 2, 0, 8, 8, 2, 2, 8, 2,
       2, 9, 0, 2, 7, 7, 9, 0, 0, 2, 4, 8, 4, 9, 2, 8, 9, 6, 4, 2, 0, 0,
       2, 2, 7, 6, 8, 3, 2, 1, 7, 2, 2, 2])

The result does not match.

Environment