Closed TheGullahanMaster closed 2 years ago
Hi, the training time depends on which dataset and how many iterations you want to train. For example, it requires at least 7 days to train FFHQ256 using 8 V100 GPUs. To reduce the training time, you could try to load our pretrain model and fine-tune on your datasets.
I only have smaller datasets (<1000 usually) to train on, will it be able to learn them without overfitting? Haven't tried bigger datasets, as my HDD is already full as it is, though i tried MNIST, and it seems to learn quite quickly. Thanks for suggestions BTW :)
Also, i want to ask. Your paper seems to have learning rate 0.00005 and 0.0002 for generator and discriminator respectively, yet your code has them equal to each other, namely 0.0002. It does seem to work fine, just asking if there was a change or anything?
Also, i want to ask. Your paper seems to have learning rate 0.00005 and 0.0002 for generator and discriminator respectively, yet your code has them equal to each other, namely 0.0002. It does seem to work fine, just asking if there was a change or anything?
When using --ttur
in the command line, we set the learning rate of G to (learning rate of D)/4. You could refer to our code for more detail.
I only have smaller datasets (<1000 usually) to train on, will it be able to learn them without overfitting? Haven't tried bigger datasets, as my HDD is already full as it is, though i tried MNIST, and it seems to learn quite quickly. Thanks for suggestions BTW :)
You could try to add augmentation like DiffAug or ADA when training small datasets, which may be helpful to obtain better performance.
Thank you so much for the help, will try it as well. Say, the paper says you will start to see advantages of StyleSwin on 256x256 or higher. What are the results expected on smaller ones, like 128x128, or 64x64? Are they still very good? So far it seems pretty good, but the smallest i tried was 128x128. (MNIST i simply upscaled)
Also, what is the purpose of bCR? Isn't that similar to DiffAugm, or is that something else? (it's there under --bcr argument)
Thank you so much for the help, will try it as well. Say, the paper says you will start to see advantages of StyleSwin on 256x256 or higher. What are the results expected on smaller ones, like 128x128, or 64x64? Are they still very good? So far it seems pretty good, but the smallest i tried was 128x128. (MNIST i simply upscaled)
We have tried our model on 64x64 resolution in early exploration, the results are also competitive. Note that the best hyperparameters are different on each resolution, you may need to tune the hyper-param to obtain the best performance.
What hyperparameters did you use for 256x256 if i may ask?
Also, what is the purpose of bCR? Isn't that similar to DiffAugm, or is that something else? (it's there under --bcr argument)
Yes, it is another kind of regularization to help obtain better results, you could refer to the original paper for detail.
Also, what is the purpose of bCR? Isn't that similar to DiffAugm, or is that something else? (it's there under --bcr argument)
Yes, it is another kind of regularization to help obtain better results, you could refer to the original paper for detail.
Thanks for clearing that up! Love the project so far
Now all i want to know is what hyperparameters did you choose for LSUN Church 256x256?
Now all i want to know is what hyperparameters did you choose for LSUN Church 256x256?
python -m torch.distributed.launch --nproc_per_node=8 train_styleswin.py --batch 4 --path /path_to_lsun_church_256 --checkpoint_path /tmp --sample_path /tmp --size 256 --G_channel_multiplier 2 --use_flip --r1 5 --lmdb --D_lr 0.0002 --D_sn --ttur --eval_gt_path /path_to_lsun_church_real_images_50k --lr_decay --lr_decay_start_steps 1300000 --iter 1500000
The command of training LSUN Church 256 in README contains all hyperparameters we used, which could also be found in our paper's appendix.
Thank so much for help!
How long should i expect it to train on 256x256 resolution? I only have 1 GPU, if that helps