openai / guided-diffusion

MIT License
6.03k stars 803 forks source link

Question about training on multi GPUs #65

Closed YUHANG-Ma closed 2 years ago

YUHANG-Ma commented 2 years ago

Hi, I use multi-gpus to train my model like mpiexec -n 8 python scripts/image_train.py. However, I encountered an issue that I couldn't load the ckpt to all gpus and it encountered OOM when I finetuned 512*512 model. But I didn't encountered this problem when I tried to train on one GPU. Could I ask what I should do?

YUHANG-Ma commented 2 years ago

I split my dataset as this way. for i in range(0, len(org_tar_urls)-1): if i % MPI.COMM_WORLD.size != MPI.COMM_WORLD.rank: continue url = org_tar_urls[i] train_tar_url.append(url)