Problem of pix2pix on two different devices that shows 'nan' at the begining

junyanz / pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch

Other

22.8k stars 6.29k forks source link

Problem of pix2pix on two different devices that shows 'nan' at the begining #1487

Open YujieXiang opened 2 years ago

YujieXiang commented 2 years ago

When I use 2 different devices to run the pix2pix training part , one can smoothly finish the training part but another leads to 'nan' in loss function since the begining as the figure shows. The environments and dataset(facades) are quite the same.

taesungp commented 2 years ago

Do you mean the same training runs well on one device, and it produces NaN on the other device? Or did you try to do multi-gpu training? In the case of former, yes, it's likely a GPU issue...