Open Jason0401 opened 6 months ago
Thanks for using LightGBM.
The tree_learner
setting only affects training, not prediction.
You can pass num_threads
through parameters for prediction. That tells LightGBM to parallelize prediction over rows in the input, so is used to improve throughput.
If you are predicting on only one row at a time, using multithreading won't improve the prediction speed and you'll only ever see one CPU core active.
Thanks for your answer. Only one row is predicted at a time in my case, I use LGBM_BoosterPredictForMatSingleRowFastInit and LGBM_BoosterPredictForMatSingleRowFast to predict and I guess it is the fastest method offered.
There is a parameter in LGBM_BoosterPredictForMatSingleRowFastInit: const int data_type
I found that no matter whether you choose C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64, the internal processing is based on double. I'm curious why float-based processing isn't supported, I think it's faster.
I found that no matter whether you choose C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64, the internal processing is based on double
Can you share some links or other evidence that makes you think this?
During the prediction process, moving from the root node to a leaf node requires discrete access to a double-type feature array, which will cause cache miss. Since each element of the float array takes up less space, the CPU cache can be used more efficiently when stored and accessed continuously in memory I don't have evidence of actual testing at the moment, I'll give it a try when I have time.
Ok, we'd really appreciate specific evidence for the claim you're making (like links to the relevant parts of LightGBM's code). Otherwise, you're asking someone to do investigation that you've already done.
If parameter tree_learner in my model.txt is serial, can each tree in this model be predicted using multiple threads? when I test it, I found only one thread with 100% CPU usage, all the otheer thread had zero CPU usage.