rapidsai / dask-cuda

Utilities for Dask and CUDA interactions
https://docs.rapids.ai/api/dask-cuda/stable/
Apache License 2.0
290 stars 92 forks source link

test_cuda_visible_devices_and_memory_limit_and_nthreads spews benign (?) warnings on systems with fewer than eight GPUs #1127

Open wence- opened 1 year ago

wence- commented 1 year ago
          > > it doesn't look like it failed any tests though. Is this a problem?

This looks bad to me. I wonder if it is happening not just on this branch, I will investigate

On closer examination, this error in the test logs comes from this test

https://github.com/rapidsai/dask-cuda/blob/8134e6bebf50e4b4b428e07eebc98c3b2e851774/dask_cuda/tests/test_dask_cuda_worker.py#L25-L61

Which is written assuming eight GPUs are available on the system running the test. So I think these problems in the logs are benign, but I will open a separate PR to fix this latter problem for the 23.04 branch.

Originally posted by @wence- in https://github.com/rapidsai/dask-cuda/issues/1123#issuecomment-1440516898

pentschev commented 1 year ago

This is intentional, we're testing the ability to set CUDA_VISIBLE_DEVICES appropriately for each worker, and as noticed the downside are those errors that are harmless when fewer GPUs are available, but the logic test is still successful nevertheless. We could eventually attempt to ingest the warnings and suppress them instead, but other than that we don't want to remove the test or dumb it down only to prevent the stdout/stderr output.

wence- commented 1 year ago

My proposal would be to try and capture the warnings if the number of devices is fewer than required (rather than dumbing down the test).

pentschev commented 1 year ago

Yes, that would be ideal in this situation.