microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.66k stars 3.83k forks source link

Thread Safe Problem #2201

Closed cuhkcarl closed 5 years ago

cuhkcarl commented 5 years ago

`bool LGBMPredictor::Predict( const float matrix, int nrow, int ncol, std::vector preds) { // Comm::LogErr("[LGBMPredictor][debug] nrow:%lu, ncol:%lu", nrow, ncol);

Predictor predictor(boosting_model_.get(), 
                config_.num_iteration_predict,
                config_.predict_raw_score,
                config_.predict_leaf_index,
                config_.predict_contrib,
                config_.pred_early_stop,
                config_.pred_early_stop_freq,
                config_.pred_early_stop_margin);

auto pred_fun = predictor.GetPredictFunction();

for(int i = 0; i < nrow; ++i)
{
    std::vector<std::pair<int, double>> one_row;
    auto start = matrix + ncol * i;
    for(int j = 0; j < ncol; ++j)
    {
        one_row.push_back(j, static_cast<double>(*(start + j)));
    }
    pred_fun(one_row, &((*preds)[i]));
}
return true;

}`

I am going to write predict code like above, without any thread lock. This funtion is supposed to called in multithreading, is that thread safe?

guolinke commented 5 years ago

where do you parallel? parallel call this function?

cuhkcarl commented 5 years ago

where do you parallel? parallel call this function?

yes, I have 25 threads to call this function.

cuhkcarl commented 5 years ago

Further, i read the c_api code. To my understanding, the lock in Booster::Predict is just used to maintain the parameter like ### double* outresult, because I could not find any member variable would be modify in boosting and predictor during multithreading. Therefore I think it is thread safe naturally, even without a thread lock.

If I am wrong, please tell me which line of codes would be a risk of thread unsafe.

guolinke commented 5 years ago

if you can ensure the argument preds is not written to the same index in different threads, it is thread-safe.

cuhkcarl commented 5 years ago

if you can ensure the argument preds is not written to the same index in different threads, it is thread-safe.

that's nice, thank you very much

guolinke commented 5 years ago

there is lock in predict now, for the boosting is modified here https://github.com/microsoft/LightGBM/blob/f01b2aca13e787e2945eba7c2bd85e31c65923f8/src/boosting/gbdt.h#L334-L345

However, I think the num_iteration_forpred could be moved to predictor, then we can remove this lock.