Closed TheGullahanMaster closed 2 years ago
Also, did you experiment with an increase in the "depths" in models/generator.py? The default is 2 all the way, but when designing styleswin, did you try depth of 4, 12 etc...? What were the results?
Hello, i forgot to ask in the previous issue, but what is the intuition behind beta1=0.0 and beta2=0.99? I've seen it in a couple more projects (Such as CIPS), and i always wondered how did they come up with these values (As usually, GANs have beta1=0.5 and beta2=0.999). Is there some property of these values that helps training? Or is it just betas that seemed to work the most?
We just follow this setting in StyleGAN2, which also provides the best performance among all settings.
Also, did you experiment with an increase in the "depths" in models/generator.py? The default is 2 all the way, but when designing styleswin, did you try depth of 4, 12 etc...? What were the results?
To fairly compare with StyleGAN2, which uses 2 Conv layers in each resolution, we just set the depth to 2. Increasing the depth expects to improve the performance, but we didn't perform more experiments.
Also, did you experiment with an increase in the "depths" in models/generator.py? The default is 2 all the way, but when designing styleswin, did you try depth of 4, 12 etc...? What were the results?
To fairly compare with StyleGAN2, which uses 2 Conv layers in each resolution, we just set the depth to 2. Increasing the depth expects to improve the performance, but we didn't perform more experiments.
I'm currently trying out depth 12, with this coinfig
depths = [12, 12, 12, 12, 12, 12, 12, 12, 12]
in_channels = [
512, #4
256, #8
128, #16
128, #32
128 * channel_multiplier, #64
64 * channel_multiplier, #128
32 * channel_multiplier, #256
16 * channel_multiplier, #512
8 * channel_multiplier#1024
]
(my GPU cannot handle it at default values) on a tiny dataset of buildings(128x128), about 200, and it does seem somewhat faster, though it's not finished yet.
The discriminator channels were modified as well, to match the generator's channels
One more question; What does --enable_full_resolution
do? Default is 8, and it seem to set window size to 8 after it's ran through int(math.log(enable_full_resolution, 2))
.
Should i edit --enable_full_resolution
to be smaller when doing smaller resolutions than 1024? Or should i leave it be?
One more question; What does
--enable_full_resolution
do? Default is 8, and it seem to set window size to 8 after it's ran throughint(math.log(enable_full_resolution, 2))
. Should i edit--enable_full_resolution
to be smaller when doing smaller resolutions than 1024? Or should i leave it be?
This augment is used to set using full resolution attention
(window size = resolution) util which resolution. Just set as default would be better.
Ok, thanks for clearing that up. Should i also use the default amount of channels per resolutions when doing smaller resolutions?
That's depend on the performance.
By performance, you mean how well the model is performing, or if it can fir into the GPU(s)? Also, what is the minimum channels StyleSwin (The generator part) can handle before it degrades in performance too much?
How well the model is performing. We do not perform experiments using less channels, you could try it.
Ok, I'm certainly trying it.
BTW, does the accumulation depend on dataset size? 0.5 ** (32 / (10 * 1000))
is the equation, which i gather is making an EMA decay value (factor?)
Ok, I'm certainly trying it. BTW, does the accumulation depend on dataset size?
0.5 ** (32 / (10 * 1000))
is the equation, which i gather is making an EMA decay value (factor?)
No.
Ok, I'm certainly trying it. BTW, does the accumulation depend on dataset size?
0.5 ** (32 / (10 * 1000))
is the equation, which i gather is making an EMA decay value (factor?)No.
How were the (above) values chosen? Was it though trial-and-error, or did it come from StyleGAN?
We adopt the setting from StyleGAN implementation.
Thank you so much for the help, will try it as well. Say, the paper says you will start to see advantages of StyleSwin on 256x256 or higher. What are the results expected on smaller ones, like 128x128, or 64x64? Are they still very good? So far it seems pretty good, but the smallest i tried was 128x128. (MNIST i simply upscaled)
We have tried our model on 64x64 resolution in early exploration, the results are also competitive. Note that the best hyperparameters are different on each resolution, you may need to tune the hyper-param to obtain the best performance.
You have mentioned in the previous "issue" (more like a discussion) that you've tried 64x64 at the beginning and it showed competitive performance. What hyperparameters did you use(channels, etc...)? Were they the same as they are now? Also, which dataset did you try with it?
Also also also, What hyperparameters should i focus on? Batch size, learning rates, bCR weights, R1?
Thank you so much for the help, will try it as well. Say, the paper says you will start to see advantages of StyleSwin on 256x256 or higher. What are the results expected on smaller ones, like 128x128, or 64x64? Are they still very good? So far it seems pretty good, but the smallest i tried was 128x128. (MNIST i simply upscaled)
We have tried our model on 64x64 resolution in early exploration, the results are also competitive. Note that the best hyperparameters are different on each resolution, you may need to tune the hyper-param to obtain the best performance.
You have mentioned in the previous "issue" (more like a discussion) that you've tried 64x64 at the beginning and it showed competitive performance. What hyperparameters did you use(channels, etc...)? Were they the same as they are now? Also, which dataset did you try with it?
Same hyper-params as FFHQ-256. We tried on FFHQ-64.
Also also also, What hyperparameters should i focus on? Batch size, learning rates, bCR weights, R1?
LR and r1.
Thank you so much for the help, will try it as well. Say, the paper says you will start to see advantages of StyleSwin on 256x256 or higher. What are the results expected on smaller ones, like 128x128, or 64x64? Are they still very good? So far it seems pretty good, but the smallest i tried was 128x128. (MNIST i simply upscaled)
We have tried our model on 64x64 resolution in early exploration, the results are also competitive. Note that the best hyperparameters are different on each resolution, you may need to tune the hyper-param to obtain the best performance.
You have mentioned in the previous "issue" (more like a discussion) that you've tried 64x64 at the beginning and it showed competitive performance. What hyperparameters did you use(channels, etc...)? Were they the same as they are now? Also, which dataset did you try with it?
Same hyper-params as FFHQ-256. We tried on FFHQ-64.
Did you you keep the channels as they are in StyleGAN? (512 for 4x4, 512 for 8x8, 512 for 16x16, 512 for 32x32, 256 for 64x64)
Yeah.
Thanks for reply :+1:
Hello, i forgot to ask in the previous issue, but what is the intuition behind beta1=0.0 and beta2=0.99? I've seen it in a couple more projects (Such as CIPS), and i always wondered how did they come up with these values (As usually, GANs have beta1=0.5 and beta2=0.999). Is there some property of these values that helps training? Or is it just betas that seemed to work the most?