junyanz / pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch
Other
22.8k stars 6.29k forks source link

RuntimeError: Caught RuntimeError in replica 3 on device 3. #1499

Closed zccoder closed 1 year ago

zccoder commented 1 year ago

While I train using 8 gpus in a machine, I met the problem below: File "/usr/local/python/lib/python3.7/site-packages/torch/_utils.py", line 425, in reraise

raise self.exc_type(msg)

RuntimeError: Caught RuntimeError in replica 3 on device 3.

and after such problem, cuda out of memory has occured. RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 31.75 GiB total capacity; 883.36 MiB already allocated; 8.75 MiB free; 906.00 MiB reserved in total by PyTorch)

I have read Q&A, i did not fine the same problem like me. Can you give me some suggestions. thx a lot.