Open PhilippWillms opened 2 months ago
@simonsays1980 , @sven1977 : This topic only occurs when using config.build().train() . tune does not run into that issue.
@PhilippWillms Thanks for raising this issue. Its hard to reproduce as we do not have this hardware setup.
torch.cuda.device_count()
return for you?@simonsays1980 : What I learnt today based on your comment: Correct torch version is installed and torch.cuda.is_available()
returns True and torch.cuda.device_count()
returns 1 .... UNLESS you do not set environment variable CUDA_VISIBLE_DEVICES in the anaconda environment. Leave it blank and torch detects GPU correctly.
EDIT: Issue identified in ray 2.34 release, fixed a linting topic in repro script.
What happened + What you expected to happen
Configuring Windows anaconda environment with
set CUDA_VISIBLE_DEVICES='1'
, as I have one physical GPU core. Then running script below leads to following error strack traceVersions / Dependencies
ray==2.34 python==3.11.9
Reproduction script
Issue Severity
Medium: It is a significant difficulty but I can work around it.