Closed penguinbing closed 4 years ago
Hi, Me either :).
To be specific, the supplementary of the paper mentioned that c'' belongs to [-1.66, 1.66] in the extrapolation experiment (lsun dataset [N4,M4,S64]), but i don't understand the reason behind it (what is the meaning of 2 / (4-1)?). Also there may be a typo in that description (the range of micro coordinate was computed twice)~you may take a look.
Hope that the author can give us an relatively simple example of the entire coordinate system :).
Actually, you can define any coordinate system for the micro/macro coordinate system (for example, we also show that you can even use a cylindrical coordinate system), as long as the transformation between them is reasonable. It is actually super hard for me to clearly define what is a reasonable design for that.
To me, I feel like I just choose the most straightforward one. But to make it generic (supporting any N and M), it turns out the code looks super complex. In fact, the individual setting is quite simple (you may just print them out). Take the [N2, M2] setting as example, we only use 16 constant micro coordinates and 9 constant macro coordinates (since the full image generation is split into 16 micro patches generation. Meanwhile, each macro patch is formed by combining 2x2 micro patches, so there exists 9 combinations), like the following:
Micro coordinate system:
(-1, -1), (-1, -0.33), (-1, 0.33), (-1, 1)
(-0.33, -1), (-0.33, -0.33), (-0.33, 0.33), (-0.33, 1)
(0.33, -1), (0.33, -0.33), (0.33, 0.33), (0.33, 1)
(1, -1), (1, -0.33), (1, 0.33), (1, 1)
Macro coordinate system:
(-1, -1), (-1, 0), (-1, 1)
(0, -1), (0, 0), (0, 1)
(1, -1), (1, 0), (1, 1)
And the transformation from micro to macro is to ensure that, after multiple micro patches are stitched together, the position of the newly formed macro patch matches to an existing constant coordinate in the macro coordinate system.
e.g.,
After micro patches at [(-1, -1), (-1, -0.33), (-0.33, -1), (-0.33, -0.33)]
are generated, and composed to a macro patch M. The macro coordinate of M must be at the most top-left corner, which is (-1, -1)
.
In practice, I do the sampling in the macro coordinate system here, then map the sampled macro coordinates to micro coordinates here. Such an implementation ensures that I can do any sort of sampling in the macro coordinate system as I want for the training.
@EliotChenKJ In the supplementary, the setting should be (N2, M2, S64), that is a typo needs to be fixed. The logic behind the equation Y / Z = 2 / (4 - 1)
is:
- Y is the range of coordinate, the range is [-1, 1]
- The full image resolution is 256
- S64
=> The micro patch resolution is 64
=> 4x4 micro patches to form a full image
- The distance between two consecutive patches is defined by the distance between their center.
=> When there are 4 patches, the space between the left-most and the right-most patch only has (4-1) patches [Z comes from here].
the range of micro coordinate was computed twice
Yes, thanks for the reminder, the latter one should be macro coordinate.
@hubert0527 , Hi~ thanks for your answers with patience. It's so nice of you to expain the coordinate system so detailed and specific, which i can really understand now. There are also some questions i still have, hope that you can help me get through it:
Can i explain the COCOGAN like, for each coordinate, the generator learns the corresponding patches' manifold in all data from dataset? if that is true, is there a assumption that the patches with same coordinate in all data from dataset share some common feature? (e.g. , if the coordinate is [-1, -1], top-left position, the training patches are all right-eye related region.)
What is the ground truth macro patches in the "beyond-boundary generation" experiment? if the images in LSUN dataset are all cropped as 256 * 256 resolution.
At last, thank you for presenting us such amazing work.
Can I explain the COCOGAN like...
Yes and no, actually, the CNN weights (i.e., shared representations, shared features) are shared for all coordinates, the only two differences are: (a) input coordinates, and (b) the conditional batch norm parameters.
At the very first glimpse, people usually think that the generator only learns to generate different organs (for the human faces example) for different coordinates. We show that it is not the case by showing experiments on the CelebA-syn
dataset, which the human faces are not aligned at all. Furthermore, the LSUN bedroom is also not aligned.
Actually, it is awkward to think the conditional distribution of each coordinate is significantly different from each other, the generator just learns the conditional distribution of the individual coordinate that whatsoever presented to it.
Alternatively, I prefer to explain the whole thing more like: the generator still learns the conventional GANs mapping (from a latent variable to an image, it is true for COCO-GAN testing, right?), but, with an additional conditional coordinate input to query which part of the image to generate and to train with.
What is the ground truth macro patches in the "beyond-boundary generation" experiment?
For those macro patches that exceed the image boundary (i.e., outside the 256x256 area), there is no ground truth. The discriminator implicitly learns the rule that, there shouldn't be any clear seems between consecutive micro patches, and the post-training enforces the generator to be aware of such a rule even outside the image boundary. And, to enforce the discriminator not to forget the rule (consecutive patches must be continuous), the weights of most of the discriminator layers are freezed during the post-training.
Thus, since there is no ground truth, the post-training cannot be continued forever. At some point, the generator will exploit the discriminator. You will have to stop the post-training at some point (requires heuristic decision).
It's really clear and i further understand now. Sincerely thank you for answering these problems so soon and clearly :).
I'm confused about the coordination computation. Could you explain it more clearly? Thank you.