Open shimopino opened 4 years ago
@KeisukeShimokawa This is an interesting suggestion: indeed, I'm also not sure what the performance difference between the two is. I did a deeper check and it seems like the TensorFlow version for BigGAN also uses a combination of Pool + Conv: https://github.com/google/compare_gan/blob/master/compare_gan/architectures/resnet_ops.py#L131 Perhaps this resblock structure is unique to BigGAN rather than the version from Miyato et al, but I might be wrong. Nonetheless, I think this is a very good point (and detail) to note and will certainly keep your suggestion in mind!
@kwotsin Thank you for your reply. I hadn't checked that repository. Thank you for sharing.
Reading the original BigGAN paper again (arxiv), I found that the following diagram was provided and that a combination of Pooling and Conv was employed.
I also explored nvidia's repository on SPADE and found that it uses a combination of Pooling and Conv for ResBlock as well (e.g. https://github.com/NVlabs/SPADE/blob/master/models/networks/discriminator.py#L46).
The official implementation of Tensorflow that I have shown as a reference may have been a bit of a special implementation.
Looking at the official BigGAN implementation in Tensorflow, I found they use ConvTranspose2d for Upsample and Conv2d for Downsample in the ResNet block (e.g. https://github.com/taki0112/BigGAN-Tensorflow/blob/master/ops.py#L159).
I know that BigGAN implementations in PyTorch use a combination of Pooling and Conv (e.q. https://github.com/ajbrock/BigGAN-PyTorch/blob/master/BigGAN.py#L341), but in my experience, I can't say for sure which is the better performance.
In the future, is it possible to flexibly select an operation to change the resolution of the input feature map?