guitargeek / XGBoost-FastForest

Minimal library code to deploy XGBoost models in C++.
MIT License
86 stars 30 forks source link

Cannot handle a multi-class model containing a tree with a single leaf node? #15

Closed jungeorge closed 2 years ago

jungeorge commented 2 years ago

Hi @guitargeek,

Thanks for sharing such a great tool. Overall, it works quite well. But I still identified a small "bug" when I tried to convert a multi-class model trained with python to C++. I feel like the package cannot handle a multi-class model containing a tree with a single leaf node. To quickly replicate this issue, we just need to train a "large" model with less training data:

training_X, training_Y = make_classification(n_samples=100, n_features=100, n_informative=3, random_state=42, n_classes=3, weights=[0.33, 0.33])

model = XGBClassifier(n_estimators=100, max_depth=7, objective='multi:softmax',  eval_metric='mlogloss', use_label_encoder=False).fit(training_X, training_Y)

After converting this model using FastForest, there were discrepancies between C++ and python probability output. Of course, this is just an extremely rare example [e.g., we only have 100 data samples for training]. However, I did notice that as long as the trained model [even trained with a large amount of data] contains a tree with only one leaf node, the C++ output and python output won't be exactly the same.

More than happy to provide more details if I am not clear. Looking forward to your solution.

Thanks.

guitargeek commented 2 years ago

Thank you very much for your kind feedback and finding this corner case where fastforest didn't work! I have fixed this now and added a new unit test based on your code snippet here.