microsoft / StyleSwin

[CVPR 2022] StyleSwin: Transformer-based GAN for High-resolution Image Generation
https://arxiv.org/abs/2112.10762
MIT License
508 stars 49 forks source link

Why do you replace noise with SPE? #10

Closed diamond0910 closed 2 years ago

diamond0910 commented 2 years ago

Compared with Stylegan2, I notice that you you replace noise with SPE at the same place. What are the differences between SPE and noise? Can SPE achieve the effect of noise? Seems like SPE is a fixed vector?

Thanks.

ForeverFancy commented 2 years ago

Thanks for your interest of our work. Note that we do not replace noise injection with SPE. To purely measure the performance of the generator backbone, we remove noise injection in all experiments. The zero padding of convolution could provide model absolution pixel position when generating, which is missing in transformers. So we add SPE to provide absolution global position for transformer generator.

diamond0910 commented 2 years ago

Thanks for your reply. It seems that adding noise will improve the performance in StyleGAN1. How much performance gain have you tried with StyleSwin by adding noise?

ForeverFancy commented 2 years ago

In experiments, we have observed that simply adding noise would not result in significant performance improvement. We hypothesis that the noise input may take effect with specific architecture or components (like anti-aliasing upsampling, which we do not use in the StyleSwin), further improvement of the transformer generator is under exploration.