microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.53k stars 3.82k forks source link

Currently cuda version only supports training on a single GPU #6045

Open HenrySun98 opened 1 year ago

HenrySun98 commented 1 year ago

Description

I build LightGBM CUDA implementation, and set GPU parameter "device" : 'cuda' and "num_gpu" : 4 But I got a Fatal Error: Currently cuda version only supports training on a single GPU

In the official doc, it says multiple GPUs is supported in CUDA version.

My LightGBM version is : 4.0.0.99

### Tasks
HenrySun98 commented 1 year ago

Also, GPU parameter gpu_id accepts int value, how can I determine which GPUs used in the task? Suppose I have 8 GPUs in the server.

jameslamb commented 1 year ago

In the official doc, it says multiple GPUs is supported in CUDA version

Please share a link to what you're referring to with this statement.


@shiyu1994 could you answer the other parts of this report?

HenrySun98 commented 1 year ago

@jameslamb

https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_gpu

I found this link for num_gpu parameter.

HenrySun98 commented 1 year ago

@jameslamb @shiyu1994 expect your replies, many thx

bstockton commented 9 months ago

@jameslamb Is there any update on this? This seems like a very direct conflict with the documentation and implemented code that is causing a lot of confusion.