kkanshul / finegan

FineGAN: Unsupervised Hierarchical Disentanglement for Fine-grained Object Generation and Discovery
http://krsingh.cs.ucdavis.edu/krishna_files/papers/finegan/index.html
BSD 2-Clause "Simplified" License
277 stars 43 forks source link

bounding box to mask #3

Closed sitongsu closed 5 years ago

sitongsu commented 5 years ago

hello, i'm interested in your work but i'm confused about how you transfer the bounding box to the bird mask. after reading your code, i still cannot figure out the process. what does "warped_bbox" mean in your code? looking for your reply the related code is shown below

`x1 = self.warped_bbox[0][i] x2 = self.warped_bbox[2][i] y1 = self.warped_bbox[1][i] y2 = self.warped_bbox[3][i]

                a1 = max(torch.tensor(0).float().cuda(), torch.ceil((x1 - self.recp_field)/self.patch_stride))
                a2 = min(torch.tensor(self.n_out - 1).float().cuda(), torch.floor((self.n_out - 1) - ((126 - self.recp_field) - x2)/self.patch_stride)) + 1
                b1 = max(torch.tensor(0).float().cuda(), torch.ceil((y1 - self.recp_field)/self.patch_stride))
                b2 = min(torch.tensor(self.n_out - 1).float().cuda(), torch.floor((self.n_out - 1) - ((126 - self.recp_field) - y2)/self.patch_stride)) + 1

                if (x1 != x2 and y1 != y2):
                        weights_real[i, :, a1.type(torch.int) : a2.type(torch.int) , b1.type(torch.int) : b2.type(torch.int)] = 0.0`
utkarshojha commented 5 years ago

Hi, We don't transfer bounding box to bird masks (and to be clear, we don't use any bird masks as supervision in our model). The bounding box coordinates are just used to fetch patches that lie outside the bounding box (basically the patches which are not part of the foreground entity, i.e. background patches). These background patches are used used as 'real' patches for training generator/discriminator pair in the background stage. This is because we don't have access to complete background images to be used as real images. So the discriminator network at the background stage operates on a patch level, i.e. it assigns a real/fake score to each patch instead of a complete image (for more information about how discriminators are employed on a patch level, please see this discussion thread https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/39).

In simple words, the discriminator at background stage takes as input the complete real image and predicts a NxN matrix, where each member reflects the real/fake score of the corresponding member's receptive field in the input image. Now since we only have to obtain supervision from patches which lie outside the bounding box (true background patches), we need to mask those output members from the NxN matrix whose receptive fields in the input image contains regions inside the bounding box.

In the above code, a1, a2, b1 and b2 do just that. So all the members in NxN from a1:a2 (along rows) and b1:b2 (along columns) will be masked, and loss will only be computed from remaining members in NxN.

As for the 'warped_bbox', its just bounding box coordinates for the image after we've done some transformations on it (ex. resizing, random cropping, horizontal flip)

Note: I know this is a complicated explanation. So let me know if some part didn't make sense

sitongsu commented 5 years ago

Thanks for your detailed explanation. After reading your reply, i got to know that the bounding box is designed to exclude the foreground part for generating the background part.

Since the code i shown above is a little bit confusing, i was wondering whether you could add some explanatory notes to the code.

Thanks again for your reply