Open wence- opened 1 year ago
This is intentional, we're testing the ability to set CUDA_VISIBLE_DEVICES
appropriately for each worker, and as noticed the downside are those errors that are harmless when fewer GPUs are available, but the logic test is still successful nevertheless. We could eventually attempt to ingest the warnings and suppress them instead, but other than that we don't want to remove the test or dumb it down only to prevent the stdout/stderr output.
My proposal would be to try and capture the warnings if the number of devices is fewer than required (rather than dumbing down the test).
Yes, that would be ideal in this situation.
On closer examination, this error in the test logs comes from this test
https://github.com/rapidsai/dask-cuda/blob/8134e6bebf50e4b4b428e07eebc98c3b2e851774/dask_cuda/tests/test_dask_cuda_worker.py#L25-L61
Which is written assuming eight GPUs are available on the system running the test. So I think these problems in the logs are benign, but I will open a separate PR to fix this latter problem for the 23.04 branch.
Originally posted by @wence- in https://github.com/rapidsai/dask-cuda/issues/1123#issuecomment-1440516898