sergeytulyakov / mocogan

MoCoGAN: Decomposing Motion and Content for Video Generation
578 stars 114 forks source link

Invariable Image Size #33

Closed jhawgs closed 3 years ago

jhawgs commented 3 years ago

I have been working with the model, and I am trying to generate images of size 128x128. I changed the --image_size option to 128. For reference, here is the full command.

$ python3 train.py  \
      --image_batch 32 \
      --video_batch 32 \
      --use_noise \
      --noise_sigma 0.1 \
      --image_discriminator PatchImageDiscriminator \
      --video_discriminator PatchVideoDiscriminator \
      --print_every 100 \
      --every_nth 2 \
      --dim_z_content 50 \
      --dim_z_motion 10 --image_size 128 \
      ../data/fb-128 ../logs/fb-2

The initial output is the following, which verifies that the option was acknowledged by the program.

{'--batches': '100000',
 '--dim_z_category': '6',
 '--dim_z_content': '50',
 '--dim_z_motion': '10',
 '--every_nth': '2',
 '--image_batch': '32',
 '--image_dataset': '',
 '--image_discriminator': 'PatchImageDiscriminator',
 '--image_size': '128',
 '--n_channels': '3',
 '--noise_sigma': '0.1',
 '--print_every': '100',
 '--use_categories': False,
 '--use_infogan': False,
 '--use_noise': True,
 '--video_batch': '32',
 '--video_discriminator': 'PatchVideoDiscriminator',
 '--video_length': '16',
 '<dataset>': '../data/fb-128',
 '<log_folder>': '../logs/fb-2'}

The program then runs, but doesn't produce images of size 128x128 and continues to create images of size 64x64. Additionally, saved models show no increase in size, contrary to the expected increase in response to a larger output size. I have traced the bug to the model definitions, specifically the following lines.

self.main = nn.Sequential(
            nn.ConvTranspose2d(dim_z, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf, self.n_channels, 4, 2, 1, bias=False),
            nn.Tanh()
        )

Note: this is only the generator definition. I would expect that each of the discriminators would also need a change analogous to what might help here.

I have tried several different resolution methods including changing n_channels, but to no avail. I just can't seem to find the point from which the arbitrary size of 64x64 originates. That does exclude that 64 is the product of 8, 4, 2, and 1, which are the coefficients of ngf in each output size, but I don't see how that affects the final output size of n_channels.

Although I would love to see a fix, if anybody knows where 64x64 comes from, I can do more poking, and probably find a solution myself.

sergeytulyakov commented 3 years ago

Image size is not set by the parameter, the size is imposed by the used generator and the discriminators. So if you'd like to increase image size, you need to subclass the generator and discriminators and pass them to the input. In this case it will involve adding another ConvTranspose2d block, but better results can be achieved if more recent architectures are used.