CUDA_VISIBLE_DEVICES not working?

openai / guided-diffusion

MIT License

6.11k stars 813 forks source link

CUDA_VISIBLE_DEVICES not working? #124

Open pantDevesh opened 1 year ago

pantDevesh commented 1 year ago

Despite setting CUDA_VISIBLE_DEVICES, it continues to use GPUs from the index 0. How can I ensure that the code uses specific GPUs as intended?

Schnabel-8 commented 11 months ago

I encountered the same problem

HuangruiChu commented 11 months ago

This is also a problem for me. I google it. Someone says we need to set CUDA_VISIBLE_DEVICES before import torch. However, even in my "super_train.py", I set CUDA_VISIBLE_DEVICES before importing other libraries. CUDA_VISIBLE_DEVICES still does not work. Looking into it. If I fix it, I will update here.

whitebeacon commented 11 months ago

I write os.environ["CUDA_VISIBLE_DEVICES"]="1" before line 45,it works,I don't know it is right or wrong

Kerio99 commented 11 months ago

Same question~

Kerio99 commented 11 months ago

I write os.environ["CUDA_VISIBLE_DEVICES"]="1" before line 45,it works,I don't know it is right or wrong

Hi, I wonder which .py file did you change ？

Kerio99 commented 11 months ago

I modified 'os.environ["CUDA_VISIBLE_DEVICES"] = f"{MPI.COMM_WORLD.Get_rank() % GPUS_PER_NODE}"' in 'dist_util.py' at line 27 to 'os.environ["CUDA_VISIBLE_DEVICES"]="1"' in order to specify the GPU for training.

hhsupremehh627 commented 6 months ago

Hello everyone, i meet the same question, i have solved the problem. Please take all the codes in dist_util.py change to openai/improved-diffusion dist_util.py

ZachL1 commented 2 days ago

Change this line https://github.com/openai/guided-diffusion/blob/22e0df8183507e13a7813f8d38d51b072ca1e67c/guided_diffusion/dist_util.py#L27 to:

    if "CUDA_VISIBLE_DEVICES" in os.environ:
        vis_devices = os.environ["CUDA_VISIBLE_DEVICES"].split(",")
        os.environ["CUDA_VISIBLE_DEVICES"] = vis_devices[MPI.COMM_WORLD.Get_rank() % len(vis_devices)]
    else:
        os.environ["CUDA_VISIBLE_DEVICES"] = f"{MPI.COMM_WORLD.Get_rank() % GPUS_PER_NODE}"