However, original paper "Image-to-Image Translation with Conditional Adversarial Networks" said:
Let Ck denote a Convolution-BatchNorm-ReLU layer with k filters. CDk denotes a Convolution-BatchNormDropout-ReLU layer with a dropout rate of 50%. All convolutions are 4 × 4 spatial filters applied with stride 2. Convolutions in the encoder, and in the discriminator, downsample by a factor of 2, whereas in the decoder they upsample by a factor of 2. The encoder-decoder architecture consists of:
encoder: C64-C128-C256-C512-C512-C512-C512-C512
U-Net decoder: CD512-CD1024-CD1024-C1024-C1024-C512 -C256-C128
Why, in this repo, is:
encoder: C64(no batchnorm)-C128-C256-C512-C512-C512-C512-C512
U-Net decoder: C512(no batchnorm)-C1024-CD1024-CD1024-CD1024-CD512 -C256-C128
Where especially in decoder, it is quite different. Does this repo provided a tweaked version?
According to
torchsummary
, it shows default generator used in pixpix has an architecture of:However, original paper "Image-to-Image Translation with Conditional Adversarial Networks" said:
Why, in this repo, is: encoder: C64(no batchnorm)-C128-C256-C512-C512-C512-C512-C512 U-Net decoder: C512(no batchnorm)-C1024-CD1024-CD1024-CD1024-CD512 -C256-C128
Where especially in decoder, it is quite different. Does this repo provided a tweaked version?