guitargeek / XGBoost-FastForest

Minimal library code to deploy XGBoost models in C++.
MIT License
89 stars 30 forks source link

Prediction and Scores are not the same in Python and C++ (cannot reproduce results) #22

Closed gitDawn closed 1 year ago

gitDawn commented 1 year ago

Hi,

Following the example code, I ran the following code samples.

Python:

import xgboost as xgb
from sklearn.datasets import make_classification
import numpy as np

X, y = make_classification(n_samples=10000, n_features=5, random_state=42, n_classes=2, weights=[0.5])

model = xgb.XGBClassifier().fit(X, y)
predictions = model.predict(X)
prob_predictions = model.predict_proba(X)

n = 0
print(X[n,:])
print(predictions[n])
print(prob_predictions[n])

np.save('model_predictions.npy', predictions)
booster = model._Booster
booster.dump_model("model.txt")
booster.save_model("model.bin")`

With output of:

[-2.24456934 -1.36232827  1.55433334 -2.0869092  -1.27760482]  
0
[9.994567e-01 5.432876e-04]

But when I try to run this code in C++:

int main() {
    std::vector<std::string> features{"f0",  "f1",  "f2",  "f3",  "f4"};

    const auto fastForest = fastforest::load_txt("model.txt", features);

    // std::vector<float> input{0.0, 0.2, 0.4, 0.6, 0.8};
    std::vector<float> input{-2.24456934, -1.36232827,  1.55433334, -2.0869092,  -1.27760482};

    float orig = fastForest(input.data());
    float score = 1./(1. + std::exp(-orig));
    std::vector<float> probas = fastForest.softmax(input.data());

    cout << orig << endl;
    cout << score << endl;
    cout << probas[0] << " , " << probas[1] <<endl;
}

I'm getting other results (see below). What can be wrong here? p.s 'fastForest.softmax' was changes so it won't raise an error.

-7.01733
0.000895414
0.420152 , 0.579848
gitDawn commented 1 year ago

Duplicated of https://github.com/guitargeek/XGBoost-FastForest/issues/20, sorry.