siboehm / lleaves

Compiler for LightGBM gradient-boosted trees, based on LLVM. Speeds up prediction by ≥10x.
https://lleaves.readthedocs.io/en/latest/
MIT License
370 stars 31 forks source link

Specify C interface: Prediction array zeroed out or overwrite? #46

Closed hdosuperuser closed 1 year ago

hdosuperuser commented 1 year ago

I've noticed unusual behavior while using compiled models inside C++. At predefined input set of feature values at first call to forest_root I do get same results like in LGBM or LLEAVES in Python. But when I call same function (forest_root) just again using same values result is target * 2?

To clear some doubts, I matched LGBM Python and LLEAVES Python and I get expected (valid) results, so difference is only when I call in C++ so just calling forest_root twice with same params gives different results.

It So I guess you can try with any compiled model which you have prepared.


#include "c_bench.h"
#include <iostream>
#include <vector>
int main()
{
    std::vector<double> features {/*feature values*/};

    double prediction {0};
    forest_root(features.data(), &prediction, 0, 1); // Valid prediction results  
    forest_root(features.data(), &prediction, 0, 1); // 2 x previous result, invalid? Should set prediction to same value as in first

    std::cout << "Prediction: " << prediction << std::endl;
}
hdosuperuser commented 1 year ago

Here is another hint, when using different new prediction results variable.


    double prediction1 {0};
    double prediction2 {0};

    forest_root(features.data(), &prediction1, 0, 1); // correct
    forest_root(features.data(), &prediction2, 0, 1);` // correct 
siboehm commented 1 year ago

It doesn't give different results, it's just that I assume the prediction array is zero'd and I += the results. In Python I create a new result array each time, hence that's why it works.

hdosuperuser commented 1 year ago

Thanks for quick reply, I was not aware of the fact and I could not track what is going on, since I have situation where I would basically go just change one feature value between two calls (i.e. direction kind of feature).

siboehm commented 1 year ago

I see why you ran into this problem though, the C interface is not specified anywhere! I may have a look at this in the future, either specifying the API, or just overwriting the array. Thank you for raising this :) I'll adjust the title so other's fine it more easily.

siboehm commented 1 year ago

Ok I created a fix that is also faster because get's rid of some load instructions

siboehm commented 1 year ago

Closed by #47