Open hcho3 opened 1 year ago
The bug also exists in the latest nightly Docker image (rapidsai/rapidsai-core-nightly:23.02-cuda11.5-base-ubuntu20.04-py3.9
).
The bug was probably introduced in 22.12. Using the 22.10 Docker image (nvcr.io/nvidia/rapidsai/rapidsai-core:22.10-cuda11.5-base-ubuntu20.04-py3.9
) fixes the problem.
Describe the bug On a
LocalCUDACluster
with multiple GPUs, I am observing all Dask partitions to be allocated to GPU 0, causing XGBoost to error out. Weirdly, removingimport cuml
fixes the problem.Steps/Code to reproduce bug Run this Python script:
With
import cuml
commented out, the Python program runs successfully:If
import cuml
is un-commented, we get an error:This is because all the Dask partitions were allocated to GPU 0. See the output from
nvidia-smi
:Expected behavior Importing cuML should not affect the behavior of Dask arrays.
Environment details (please complete the following information):