junyanz / pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch
Other
22.8k stars 6.29k forks source link

pix2pix architecture differs from original paper #1614

Open ming-hao-xu opened 10 months ago

ming-hao-xu commented 10 months ago

According to torchsummary, it shows default generator used in pixpix has an architecture of:

        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 128, 128]           3,072
         LeakyReLU-2         [-1, 64, 128, 128]               0
            Conv2d-3          [-1, 128, 64, 64]         131,072
       BatchNorm2d-4          [-1, 128, 64, 64]             256
         LeakyReLU-5          [-1, 128, 64, 64]               0
            Conv2d-6          [-1, 256, 32, 32]         524,288
       BatchNorm2d-7          [-1, 256, 32, 32]             512
         LeakyReLU-8          [-1, 256, 32, 32]               0
            Conv2d-9          [-1, 512, 16, 16]       2,097,152
      BatchNorm2d-10          [-1, 512, 16, 16]           1,024
        LeakyReLU-11          [-1, 512, 16, 16]               0
           Conv2d-12            [-1, 512, 8, 8]       4,194,304
      BatchNorm2d-13            [-1, 512, 8, 8]           1,024
        LeakyReLU-14            [-1, 512, 8, 8]               0
           Conv2d-15            [-1, 512, 4, 4]       4,194,304
      BatchNorm2d-16            [-1, 512, 4, 4]           1,024
        LeakyReLU-17            [-1, 512, 4, 4]               0
           Conv2d-18            [-1, 512, 2, 2]       4,194,304
      BatchNorm2d-19            [-1, 512, 2, 2]           1,024
        LeakyReLU-20            [-1, 512, 2, 2]               0
           Conv2d-21            [-1, 512, 1, 1]       4,194,304
             ReLU-22            [-1, 512, 1, 1]               0
  ConvTranspose2d-23            [-1, 512, 2, 2]       4,194,304
      BatchNorm2d-24            [-1, 512, 2, 2]           1,024
UnetSkipConnectionBlock-25           [-1, 1024, 2, 2]               0
             ReLU-26           [-1, 1024, 2, 2]               0
  ConvTranspose2d-27            [-1, 512, 4, 4]       8,388,608
      BatchNorm2d-28            [-1, 512, 4, 4]           1,024
          Dropout-29            [-1, 512, 4, 4]               0
UnetSkipConnectionBlock-30           [-1, 1024, 4, 4]               0
             ReLU-31           [-1, 1024, 4, 4]               0
  ConvTranspose2d-32            [-1, 512, 8, 8]       8,388,608
      BatchNorm2d-33            [-1, 512, 8, 8]           1,024
          Dropout-34            [-1, 512, 8, 8]               0
UnetSkipConnectionBlock-35           [-1, 1024, 8, 8]               0
             ReLU-36           [-1, 1024, 8, 8]               0
  ConvTranspose2d-37          [-1, 512, 16, 16]       8,388,608
      BatchNorm2d-38          [-1, 512, 16, 16]           1,024
          Dropout-39          [-1, 512, 16, 16]               0
UnetSkipConnectionBlock-40         [-1, 1024, 16, 16]               0
             ReLU-41         [-1, 1024, 16, 16]               0
  ConvTranspose2d-42          [-1, 256, 32, 32]       4,194,304
      BatchNorm2d-43          [-1, 256, 32, 32]             512
UnetSkipConnectionBlock-44          [-1, 512, 32, 32]               0
             ReLU-45          [-1, 512, 32, 32]               0
  ConvTranspose2d-46          [-1, 128, 64, 64]       1,048,576
      BatchNorm2d-47          [-1, 128, 64, 64]             256
UnetSkipConnectionBlock-48          [-1, 256, 64, 64]               0
             ReLU-49          [-1, 256, 64, 64]               0
  ConvTranspose2d-50         [-1, 64, 128, 128]         262,144
      BatchNorm2d-51         [-1, 64, 128, 128]             128
UnetSkipConnectionBlock-52        [-1, 128, 128, 128]               0
             ReLU-53        [-1, 128, 128, 128]               0
  ConvTranspose2d-54          [-1, 3, 256, 256]           6,147
             Tanh-55          [-1, 3, 256, 256]               0
UnetSkipConnectionBlock-56          [-1, 3, 256, 256]               0
    UnetGenerator-57          [-1, 3, 256, 256]               0
     DataParallel-58          [-1, 3, 256, 256]               0

However, original paper "Image-to-Image Translation with Conditional Adversarial Networks" said:

Let Ck denote a Convolution-BatchNorm-ReLU layer with k filters. CDk denotes a Convolution-BatchNormDropout-ReLU layer with a dropout rate of 50%. All convolutions are 4 × 4 spatial filters applied with stride 2. Convolutions in the encoder, and in the discriminator, downsample by a factor of 2, whereas in the decoder they upsample by a factor of 2. The encoder-decoder architecture consists of: encoder: C64-C128-C256-C512-C512-C512-C512-C512 U-Net decoder: CD512-CD1024-CD1024-C1024-C1024-C512 -C256-C128

Why, in this repo, is: encoder: C64(no batchnorm)-C128-C256-C512-C512-C512-C512-C512 U-Net decoder: C512(no batchnorm)-C1024-CD1024-CD1024-CD1024-CD512 -C256-C128

Where especially in decoder, it is quite different. Does this repo provided a tweaked version?