rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.26k stars 535 forks source link

[QST]Train random forest with large dataset #6054

Closed SiriusHou closed 1 month ago

SiriusHou commented 2 months ago

Hi team, I want to train a cuml random forest model with single GPU (memory size=24GB) on 100GB dataset. How can I train it? I cannot load the entire dataset to GPU. Is it possible to define an iterator and train the model by batch?

vinaydes commented 2 months ago

Have you tried using muti-gpu RF? You can find an example of it here.

Addendum on 7-Oct-2024: If multi-GPU is not an option for you, then unfortunately there is no way out. Therefore closing the issue for now.

haotianzh commented 2 months ago

I am also confused, why cuml doesn't allow sparse matrix as input, so it will save a lot of GPU memory when training the RF.