microsoft / StyleSwin

[CVPR 2022] StyleSwin: Transformer-based GAN for High-resolution Image Generation
https://arxiv.org/abs/2112.10762
MIT License
508 stars 49 forks source link

Betas #21

Closed TheGullahanMaster closed 2 years ago

TheGullahanMaster commented 2 years ago

Hello, i forgot to ask in the previous issue, but what is the intuition behind beta1=0.0 and beta2=0.99? I've seen it in a couple more projects (Such as CIPS), and i always wondered how did they come up with these values (As usually, GANs have beta1=0.5 and beta2=0.999). Is there some property of these values that helps training? Or is it just betas that seemed to work the most?

TheGullahanMaster commented 2 years ago

Also, did you experiment with an increase in the "depths" in models/generator.py? The default is 2 all the way, but when designing styleswin, did you try depth of 4, 12 etc...? What were the results?

ForeverFancy commented 2 years ago

Hello, i forgot to ask in the previous issue, but what is the intuition behind beta1=0.0 and beta2=0.99? I've seen it in a couple more projects (Such as CIPS), and i always wondered how did they come up with these values (As usually, GANs have beta1=0.5 and beta2=0.999). Is there some property of these values that helps training? Or is it just betas that seemed to work the most?

We just follow this setting in StyleGAN2, which also provides the best performance among all settings.

ForeverFancy commented 2 years ago

Also, did you experiment with an increase in the "depths" in models/generator.py? The default is 2 all the way, but when designing styleswin, did you try depth of 4, 12 etc...? What were the results?

To fairly compare with StyleGAN2, which uses 2 Conv layers in each resolution, we just set the depth to 2. Increasing the depth expects to improve the performance, but we didn't perform more experiments.

TheGullahanMaster commented 2 years ago

Also, did you experiment with an increase in the "depths" in models/generator.py? The default is 2 all the way, but when designing styleswin, did you try depth of 4, 12 etc...? What were the results?

To fairly compare with StyleGAN2, which uses 2 Conv layers in each resolution, we just set the depth to 2. Increasing the depth expects to improve the performance, but we didn't perform more experiments.

I'm currently trying out depth 12, with this coinfig depths = [12, 12, 12, 12, 12, 12, 12, 12, 12] in_channels = [ 512, #4 256, #8 128, #16 128, #32 128 * channel_multiplier, #64 64 * channel_multiplier, #128 32 * channel_multiplier, #256 16 * channel_multiplier, #512 8 * channel_multiplier#1024 ] (my GPU cannot handle it at default values) on a tiny dataset of buildings(128x128), about 200, and it does seem somewhat faster, though it's not finished yet. eval_0_000002 eval_0_000003 eval_0_000004 eval_0_000005 eval_0_000006 eval_0_000007 eval_0_000008 eval_0_000009 eval_0_000010 eval_0_000011 eval_0_000012 eval_0_000013 eval_0_000014 eval_0_000015 eval_0_000016 eval_0_000017 eval_0_000018

TheGullahanMaster commented 2 years ago

The discriminator channels were modified as well, to match the generator's channels

TheGullahanMaster commented 2 years ago

One more question; What does --enable_full_resolution do? Default is 8, and it seem to set window size to 8 after it's ran through int(math.log(enable_full_resolution, 2)). Should i edit --enable_full_resolution to be smaller when doing smaller resolutions than 1024? Or should i leave it be?

ForeverFancy commented 2 years ago

One more question; What does --enable_full_resolution do? Default is 8, and it seem to set window size to 8 after it's ran through int(math.log(enable_full_resolution, 2)). Should i edit --enable_full_resolution to be smaller when doing smaller resolutions than 1024? Or should i leave it be?

This augment is used to set using full resolution attention (window size = resolution) util which resolution. Just set as default would be better.

TheGullahanMaster commented 2 years ago

Ok, thanks for clearing that up. Should i also use the default amount of channels per resolutions when doing smaller resolutions?

ForeverFancy commented 2 years ago

That's depend on the performance.

TheGullahanMaster commented 2 years ago

By performance, you mean how well the model is performing, or if it can fir into the GPU(s)? Also, what is the minimum channels StyleSwin (The generator part) can handle before it degrades in performance too much?

ForeverFancy commented 2 years ago

How well the model is performing. We do not perform experiments using less channels, you could try it.

TheGullahanMaster commented 2 years ago

Ok, I'm certainly trying it. BTW, does the accumulation depend on dataset size? 0.5 ** (32 / (10 * 1000)) is the equation, which i gather is making an EMA decay value (factor?)

ForeverFancy commented 2 years ago

Ok, I'm certainly trying it. BTW, does the accumulation depend on dataset size? 0.5 ** (32 / (10 * 1000)) is the equation, which i gather is making an EMA decay value (factor?)

No.

TheGullahanMaster commented 2 years ago

Ok, I'm certainly trying it. BTW, does the accumulation depend on dataset size? 0.5 ** (32 / (10 * 1000)) is the equation, which i gather is making an EMA decay value (factor?)

No.

How were the (above) values chosen? Was it though trial-and-error, or did it come from StyleGAN?

ForeverFancy commented 2 years ago

We adopt the setting from StyleGAN implementation.

TheGullahanMaster commented 2 years ago

Thank you so much for the help, will try it as well. Say, the paper says you will start to see advantages of StyleSwin on 256x256 or higher. What are the results expected on smaller ones, like 128x128, or 64x64? Are they still very good? So far it seems pretty good, but the smallest i tried was 128x128. (MNIST i simply upscaled)

We have tried our model on 64x64 resolution in early exploration, the results are also competitive. Note that the best hyperparameters are different on each resolution, you may need to tune the hyper-param to obtain the best performance.

You have mentioned in the previous "issue" (more like a discussion) that you've tried 64x64 at the beginning and it showed competitive performance. What hyperparameters did you use(channels, etc...)? Were they the same as they are now? Also, which dataset did you try with it?

TheGullahanMaster commented 2 years ago

Also also also, What hyperparameters should i focus on? Batch size, learning rates, bCR weights, R1?

ForeverFancy commented 2 years ago

Thank you so much for the help, will try it as well. Say, the paper says you will start to see advantages of StyleSwin on 256x256 or higher. What are the results expected on smaller ones, like 128x128, or 64x64? Are they still very good? So far it seems pretty good, but the smallest i tried was 128x128. (MNIST i simply upscaled)

We have tried our model on 64x64 resolution in early exploration, the results are also competitive. Note that the best hyperparameters are different on each resolution, you may need to tune the hyper-param to obtain the best performance.

You have mentioned in the previous "issue" (more like a discussion) that you've tried 64x64 at the beginning and it showed competitive performance. What hyperparameters did you use(channels, etc...)? Were they the same as they are now? Also, which dataset did you try with it?

Same hyper-params as FFHQ-256. We tried on FFHQ-64.

ForeverFancy commented 2 years ago

Also also also, What hyperparameters should i focus on? Batch size, learning rates, bCR weights, R1?

LR and r1.

TheGullahanMaster commented 2 years ago

Thank you so much for the help, will try it as well. Say, the paper says you will start to see advantages of StyleSwin on 256x256 or higher. What are the results expected on smaller ones, like 128x128, or 64x64? Are they still very good? So far it seems pretty good, but the smallest i tried was 128x128. (MNIST i simply upscaled)

We have tried our model on 64x64 resolution in early exploration, the results are also competitive. Note that the best hyperparameters are different on each resolution, you may need to tune the hyper-param to obtain the best performance.

You have mentioned in the previous "issue" (more like a discussion) that you've tried 64x64 at the beginning and it showed competitive performance. What hyperparameters did you use(channels, etc...)? Were they the same as they are now? Also, which dataset did you try with it?

Same hyper-params as FFHQ-256. We tried on FFHQ-64.

Did you you keep the channels as they are in StyleGAN? (512 for 4x4, 512 for 8x8, 512 for 16x16, 512 for 32x32, 256 for 64x64)

ForeverFancy commented 2 years ago

Yeah.

TheGullahanMaster commented 2 years ago

Thanks for reply :+1: