Can not predict with multithread?

microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

https://lightgbm.readthedocs.io/en/latest/

MIT License

16.73k stars 3.84k forks source link

Can not predict with multithread? #6464

Open Jason0401 opened 6 months ago

Jason0401 commented 6 months ago

If parameter tree_learner in my model.txt is serial, can each tree in this model be predicted using multiple threads? when I test it, I found only one thread with 100% CPU usage, all the otheer thread had zero CPU usage.

jameslamb commented 6 months ago

Thanks for using LightGBM.

The tree_learner setting only affects training, not prediction.

You can pass num_threads through parameters for prediction. That tells LightGBM to parallelize prediction over rows in the input, so is used to improve throughput.

If you are predicting on only one row at a time, using multithreading won't improve the prediction speed and you'll only ever see one CPU core active.

Jason0401 commented 5 months ago

Thanks for your answer. Only one row is predicted at a time in my case, I use LGBM_BoosterPredictForMatSingleRowFastInit and LGBM_BoosterPredictForMatSingleRowFast to predict and I guess it is the fastest method offered.

There is a parameter in LGBM_BoosterPredictForMatSingleRowFastInit: const int data_type

I found that no matter whether you choose C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64, the internal processing is based on double. I'm curious why float-based processing isn't supported, I think it's faster.

jameslamb commented 5 months ago

I found that no matter whether you choose C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64, the internal processing is based on double

Can you share some links or other evidence that makes you think this?

Jason0401 commented 5 months ago

During the prediction process, moving from the root node to a leaf node requires discrete access to a double-type feature array, which will cause cache miss. Since each element of the float array takes up less space, the CPU cache can be used more efficiently when stored and accessed continuously in memory I don't have evidence of actual testing at the moment, I'll give it a try when I have time.

jameslamb commented 5 months ago

Ok, we'd really appreciate specific evidence for the claim you're making (like links to the relevant parts of LightGBM's code). Otherwise, you're asking someone to do investigation that you've already done.