junyanz / pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch
Other
23.01k stars 6.31k forks source link

The program get stuck at epoch25 when training cyclegan model #1496

Open IchbinGYH opened 2 years ago

IchbinGYH commented 2 years ago

Hello The models you made are so interesting, I learned a lot! I split two videos into many pictures as datasets, trainA has 4000+ pics, trainB has 3000+ pics, then I run the training using python train.py --dataroot ./datasets/sim2real/ --name sim2real_cyclegan --model cycle_gan --pool_size 50 --no_dropout trainA are 10241024 trainB are 480480 The program started normally, but it always get stuck at epoch25. Are my parameters setting wrong?Or is there a problem with the dataset?Appreciate it if you can answer my question:) Yuhao Guo

junyanz commented 2 years ago

This is hard to resolve. A few solutions:

  1. You can use --continue_train to restore your training epoches, and see if it fixed your issues.
  2. You need to add many printing functions (to the data loader, to forward/backward functions) and see where the program is stuck. If it is the data loader, you may want to print the image file and check whether the image has been corrupted.
  3. It might be related to memory. You may want to monitor the memory usage.