Open unbelievable3513 opened 1 week ago
Thanks for using LightGBM.
LightGBM does not currently have GPU-accelerated inference. You can see https://github.com/microsoft/LightGBM/issues/5854#issuecomment-2138659914 for some other options to try using GPUs to generate predictions with a LightGBM model.
Is there any available documentation that offers a comprehensive explanation of CUDA acceleration for LightGBM?
What does "comprehensive explanation" mean to you? Is there another library that has something like what you're looking for, and if so can you link to that?
I am particularly interested in understanding the performance trade-offs between CPU and GPU backends in both training and inference stages.
If that is true, you should try reducing your benchmarking code to just lightgbm
and numpy
/ scipy
... removing all those ONNX libraries in the middle. Otherwise, it'll be difficult to understand the difference between performance characteristics of "LightGBM" and of "LightGBM used in a specific way via some ONNX libraries".
the GPU performance during training is notably lower than that of the CPU,
Two more points on claims like this:
@jameslamb , Thank you very much for your detailed response. I have gained valuable insights from your explanation:
treelite
and fil
. I will delve deeper into these options to understand their capabilities and potential benefits.lgb_cuda (600ms)
and lgb_cpu (50ms)
as depicted in the image. Both were tested on the same model, defined by the line model = lgb.train(params, train_data, num_boost_round=100)
, with the only difference being the device specification: device = "cuda
" for the former and device = "cpu"
for the latter. The reason you provided for the observed performance difference—that the scale might be too small
—makes sense, and I will seek opportunities to conduct further experiments. Since training performance is not my primary focus, there is no need for further assistance in this regard. However, the information in the GPU-Performance.rst
document you shared is highly informative and will be of significant value.Thank you once again for your time and expertise.
the absence of an official inference acceleration version
There is just far more work to be done in this repo than people around to do it. @shiyu1994 has done most of the CUDA development in this project, maybe he can explain why training was a higher priority. I have some ideas about this but I'm not confident in them, and I don't want to misinform you.
Description
During the process of conducting source code reading and testing on LightGBM using a binary classifier, it was observed that the GPU performance during training is notably lower than that of the CPU, specifically amounting to approximately one-tenth of the CPU performance. Moreover, during inference, the option to utilize any backend other than the CPU is not accessible. The GPU performance pertains to the device=
cuda
, which employs CudaTree, whereas the CPU backend refers to device=cpu
, which utilizesTree
or its derivatives. The following questions arise:Has this phenomenon been observed by others? If so, why is the CUDA backend (
CUDATree
) not employed during inference? Is it due to the operator characteristics being more advantageous for the CPU?In which inference case would the CUDA backend definitely surpass the CPU backend?
Is there any available documentation that offers a comprehensive explanation of CUDA acceleration for LightGBM?
infer
train
onnx_runtime_c++(infer)
Reproducible example
Environment info
Command(s) I used to install LightGBM
Additional Comments
I am particularly interested in understanding the performance trade-offs between CPU and GPU backends in both training and inference stages. Any insights or documentation on this topic would be greatly appreciated.