Open dpelacani opened 3 years ago
Good question! I can not exactly answer your question, my hypothesis is about distribution of image domains. For instance, your domain A has a green tone but your domain B has a red tone. Then it's possible for discriminators to learn with even 1x1 patches.
So the 1x1 patch analyses 3 channels and is essentially a 3x1 prediction? Do you have any idea of how the code adapts to this when the input and output are set for a single channel?
I guess there's no much difference due to the property of convolution.
As far as I understand the Pixel Discriminator is equivalent to a PatchGAN of size 1x1, meaning the decoder tries to decide if each pixel is real or fake. But without taking into account surrounding pixels, what information does it use to assign a label and how is it also able to recover high frequency contents of the images? I have been working on a project and the only stable training I could get was using the pixel discriminator. With the patch gan the discriminator started off too good compared to the generator and the network couldn't learn. Hence my curiosity.