Hello, I would first like to thank you for sharing your work.
I am having problem on loading weights from checkpoints(i.e on continuing halted training)
I am training StyleMapGAN on custom dataset(~200K images in the training dataset, 1024*1024 resoulution), and I am currently using 3 TitanRTX GPUs. I am using latent_spatial_size=16 considering input image resolution and GPU memory. On training with such configuration, batch 2 is allocated per GPU using ~21 GiB memory.
There is no problem on training from scratch. I have not tried using pretrained weights trained on FFHQ or CelebA because my data is quite different from human faces. Moreover, as I have succeeded on generating images from generate.py, I think weights were saved in proper way.
However, memory allocation problem occurs every time I load custom weights to continue training. I assumed extra memory may be required on loading weights, so I tried using smaller batch size (batch 2 per GPU->batch 1 per GPU), but same memory shortage problem occurs.
To summarize, I cannot load weights to continue training, whereas training from scratch or loading weights to generate images are working well. Thereafter, I would like to ask following questions.
Had any of the authors experienced with similar problems?
Would there be any possible solutions to my problem?
I would be grateful if you take a look into my question. Thank you!
Hello, I would first like to thank you for sharing your work.
I am having problem on loading weights from checkpoints(i.e on continuing halted training)
I am training StyleMapGAN on custom dataset(~200K images in the training dataset, 1024*1024 resoulution), and I am currently using 3 TitanRTX GPUs. I am using latent_spatial_size=16 considering input image resolution and GPU memory. On training with such configuration, batch 2 is allocated per GPU using ~21 GiB memory.
There is no problem on training from scratch. I have not tried using pretrained weights trained on FFHQ or CelebA because my data is quite different from human faces. Moreover, as I have succeeded on generating images from generate.py, I think weights were saved in proper way.
However, memory allocation problem occurs every time I load custom weights to continue training. I assumed extra memory may be required on loading weights, so I tried using smaller batch size (batch 2 per GPU->batch 1 per GPU), but same memory shortage problem occurs.
To summarize, I cannot load weights to continue training, whereas training from scratch or loading weights to generate images are working well. Thereafter, I would like to ask following questions.
I would be grateful if you take a look into my question. Thank you!