Open pantDevesh opened 1 year ago
I encountered the same problem
This is also a problem for me. I google it. Someone says we need to set CUDA_VISIBLE_DEVICES before import torch. However, even in my "super_train.py", I set CUDA_VISIBLE_DEVICES before importing other libraries. CUDA_VISIBLE_DEVICES still does not work. Looking into it. If I fix it, I will update here.
I write os.environ["CUDA_VISIBLE_DEVICES"]="1" before line 45,it works,I don't know it is right or wrong
Same question~
I write os.environ["CUDA_VISIBLE_DEVICES"]="1" before line 45,it works,I don't know it is right or wrong
Hi, I wonder which .py file did you change ?
I modified 'os.environ["CUDA_VISIBLE_DEVICES"] = f"{MPI.COMM_WORLD.Get_rank() % GPUS_PER_NODE}"' in 'dist_util.py' at line 27 to 'os.environ["CUDA_VISIBLE_DEVICES"]="1"' in order to specify the GPU for training.
Hello everyone, i meet the same question, i have solved the problem. Please take all the codes in dist_util.py change to openai/improved-diffusion dist_util.py
Change this line https://github.com/openai/guided-diffusion/blob/22e0df8183507e13a7813f8d38d51b072ca1e67c/guided_diffusion/dist_util.py#L27 to:
if "CUDA_VISIBLE_DEVICES" in os.environ:
vis_devices = os.environ["CUDA_VISIBLE_DEVICES"].split(",")
os.environ["CUDA_VISIBLE_DEVICES"] = vis_devices[MPI.COMM_WORLD.Get_rank() % len(vis_devices)]
else:
os.environ["CUDA_VISIBLE_DEVICES"] = f"{MPI.COMM_WORLD.Get_rank() % GPUS_PER_NODE}"
Despite setting CUDA_VISIBLE_DEVICES, it continues to use GPUs from the index 0. How can I ensure that the code uses specific GPUs as intended?