microsoft / StyleSwin

[CVPR 2022] StyleSwin: Transformer-based GAN for High-resolution Image Generation
https://arxiv.org/abs/2112.10762
MIT License
508 stars 49 forks source link

Training time #20

Closed TheGullahanMaster closed 2 years ago

TheGullahanMaster commented 2 years ago

How long should i expect it to train on 256x256 resolution? I only have 1 GPU, if that helps

ForeverFancy commented 2 years ago

Hi, the training time depends on which dataset and how many iterations you want to train. For example, it requires at least 7 days to train FFHQ256 using 8 V100 GPUs. To reduce the training time, you could try to load our pretrain model and fine-tune on your datasets.

TheGullahanMaster commented 2 years ago

I only have smaller datasets (<1000 usually) to train on, will it be able to learn them without overfitting? Haven't tried bigger datasets, as my HDD is already full as it is, though i tried MNIST, and it seems to learn quite quickly. Thanks for suggestions BTW :)

TheGullahanMaster commented 2 years ago

Also, i want to ask. Your paper seems to have learning rate 0.00005 and 0.0002 for generator and discriminator respectively, yet your code has them equal to each other, namely 0.0002. It does seem to work fine, just asking if there was a change or anything?

ForeverFancy commented 2 years ago

Also, i want to ask. Your paper seems to have learning rate 0.00005 and 0.0002 for generator and discriminator respectively, yet your code has them equal to each other, namely 0.0002. It does seem to work fine, just asking if there was a change or anything?

When using --ttur in the command line, we set the learning rate of G to (learning rate of D)/4. You could refer to our code for more detail.

ForeverFancy commented 2 years ago

I only have smaller datasets (<1000 usually) to train on, will it be able to learn them without overfitting? Haven't tried bigger datasets, as my HDD is already full as it is, though i tried MNIST, and it seems to learn quite quickly. Thanks for suggestions BTW :)

You could try to add augmentation like DiffAug or ADA when training small datasets, which may be helpful to obtain better performance.

TheGullahanMaster commented 2 years ago

Thank you so much for the help, will try it as well. Say, the paper says you will start to see advantages of StyleSwin on 256x256 or higher. What are the results expected on smaller ones, like 128x128, or 64x64? Are they still very good? So far it seems pretty good, but the smallest i tried was 128x128. (MNIST i simply upscaled)

TheGullahanMaster commented 2 years ago

Also, what is the purpose of bCR? Isn't that similar to DiffAugm, or is that something else? (it's there under --bcr argument)

ForeverFancy commented 2 years ago

Thank you so much for the help, will try it as well. Say, the paper says you will start to see advantages of StyleSwin on 256x256 or higher. What are the results expected on smaller ones, like 128x128, or 64x64? Are they still very good? So far it seems pretty good, but the smallest i tried was 128x128. (MNIST i simply upscaled)

We have tried our model on 64x64 resolution in early exploration, the results are also competitive. Note that the best hyperparameters are different on each resolution, you may need to tune the hyper-param to obtain the best performance.

TheGullahanMaster commented 2 years ago

What hyperparameters did you use for 256x256 if i may ask?

ForeverFancy commented 2 years ago

Also, what is the purpose of bCR? Isn't that similar to DiffAugm, or is that something else? (it's there under --bcr argument)

Yes, it is another kind of regularization to help obtain better results, you could refer to the original paper for detail.

TheGullahanMaster commented 2 years ago

Also, what is the purpose of bCR? Isn't that similar to DiffAugm, or is that something else? (it's there under --bcr argument)

Yes, it is another kind of regularization to help obtain better results, you could refer to the original paper for detail.

Thanks for clearing that up! Love the project so far

TheGullahanMaster commented 2 years ago

Now all i want to know is what hyperparameters did you choose for LSUN Church 256x256?

ForeverFancy commented 2 years ago

Now all i want to know is what hyperparameters did you choose for LSUN Church 256x256?

python -m torch.distributed.launch --nproc_per_node=8 train_styleswin.py --batch 4 --path /path_to_lsun_church_256 --checkpoint_path /tmp --sample_path /tmp --size 256 --G_channel_multiplier 2 --use_flip --r1 5 --lmdb --D_lr 0.0002 --D_sn --ttur --eval_gt_path /path_to_lsun_church_real_images_50k --lr_decay --lr_decay_start_steps 1300000 --iter 1500000 The command of training LSUN Church 256 in README contains all hyperparameters we used, which could also be found in our paper's appendix.

TheGullahanMaster commented 2 years ago

Thank so much for help!