sefibk / KernelGAN

Other
337 stars 77 forks source link

about Generator and Discriminator output size #71

Open ArsenalCheng opened 1 year ago

ArsenalCheng commented 1 year ago

Hi, thanks for your excellent work!

When I ran the training code, I found that the output size is 2626 (input size is 6464). According to the paper, the output size should be half of the input size. And I check the Generator's code, I found that no "padding" initialization. If I set padding=3 in the first layer(kernel size =7) and so on. The output size of Generator is 32*32. I wonder if my solution is right.

Looking forward to your reply.

sefibk commented 1 year ago

Hi @Syllables01, Not sure I follow so LMK if I missed the point.

  1. The Generator has 1 convolution layer with stride 2 and all the others have no stride (=1) ==> The image is downscaled by 2 (except for boundaries effects but they should be minor). I don't understand how you get such a small image. What happens for an image from the dataset (~1024x1024) ?
  2. WDYM by

    The output size of Generator is 32*32

Are you referring to the image or the kernel?

ArsenalCheng commented 1 year ago

I mean that as for the kernel generator, it should imitate downscale X2. However, in the original code, when sending input image of shape 6464, the shape of the kernel generator is not 3232. Should I alter padding so that the kernel generator imitates downscale X2?

sefibk commented 1 year ago

No! The Kernel is independent of the image size. The kernel is the function that downscales the image - it convolves the image and since it does it with a stride, the resulting image is smaller. You chose how big of a kernel you want. Previous research shows that 13x13 captures most of the downscaling effect, in scale factor 2, so that is the default (physically, the kernel is large but mathematically - 13x13 should suffice)

ArsenalCheng commented 1 year ago

Thanks. I know that the Kernel is independent of the image size. The function "train" in train.py estimates the image-specific "kernel" by chosen patches. However, when sending an 6464 patch into the downscale generator, the output size is not 3232. If the generator imitates the downscaling operation by the factor 2, why does this situation happen? Looking forward to your reply!

sefibk commented 1 year ago

If you are referring to the training phase - as far as I remember, it trains on a crop and not the entire image, to reduce runtime (and since it doesn't need such a large amount of patches for each forward-backward step). But you should verify what I said in the code