Closed Mengzibin closed 2 years ago
Hi! Thanks for your attention.
You don't have to worry about "total progress", just check the "stage" (iteration)! The "total progress" above is from our baseline pi-GAN. Just like you, when I first saw this in pi-GAN, I thought training would never end :(
In my experience, a total of 140,000 (32x32: 60,000 and 64x64: 80,000) iterations was enough to train our model on 64x64.
-Jeong-gi
Hi! Thanks for your attention.
You don't have to worry about "total progress", just check the "stage" (iteration)! The "total progress" above is from our baseline pi-GAN. Just like you, when I first saw this in pi-GAN, I thought training would never end :(
In my experience, a total of 140,000 (32x32: 60,000 and 64x64: 80,000) iterations was enough to train our model on 64x64.
-Jeong-gi
Hi!Thanks for your answer.
And I have another questions when i transferred my attention to iterations. When I finished the first stage and was going to get into the next stage, it would show this code:
RuntimeError: CUDA out of memory. Tried to allocate 336.00 MiB (GPU 0; 23.70 GiB total capacity; 18.86 GiB already allocated; 36.56 MiB free; 19.51 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
But it went well in the first stage. I have tried to modify the batch size but get nothing improved. The memory of a single GPU is 24000MiB and the whole experiment used 5 GPUs.
And Does 'the total of 140,000 iterations' means I need to go through 60,000 iterations (one stage) when I set the image size as 32x32 and need to go through 80,000 iterations when I set the image size as 64x64?
Thanks for your answer anain.
If you meet OOM, you need to reduce your batch size (per GPU) in curriculum.py In addition, for Multi-GPU training please refer pi-GAN because we modified some parts for single gpu training from pi-gan implementation.
And Does 'the total of 140,000 iterations' means I need to go through 60,000 iterations (one stage) when I set the image size as 32x32 and need to go through 80,000 iterations when I set the image size as 64x64?
Yes
In practice, replicating source code is much slower than this article suggests.
My practical operations are as follows: Download img_align_celeba.zip dataset and unzip it in the specified catalogue. Run the following code "python train_surf.py --output_dir third --curriculum CelebA" The results I got are as follows.
The code shows that completing this program needs about 813 hours and I repeated this procedures, only to get the same results.