knazeri / edge-connect

EdgeConnect: Structure Guided Image Inpainting using Edge Prediction, ICCV 2019 https://arxiv.org/abs/1901.00212
http://openaccess.thecvf.com/content_ICCVW_2019/html/AIM/Nazeri_EdgeConnect_Structure_Guided_Image_Inpainting_using_Edge_Prediction_ICCVW_2019_paper.html
Other
2.5k stars 530 forks source link

Question about default image color values in hole area #42

Closed hi-zhengcheng closed 5 years ago

hi-zhengcheng commented 5 years ago

Hi Knazeri, the code of getting images_masked is as follows:

the hole areas in the image are filled with 1. I also read some other inpainting code and find that some code filled holes with 0. Is the inpainting result affected by filling hole areas with 1 or 0?

knazeri commented 5 years ago

@hi-zhengcheng Honestly, our motivation for filling the hole with 1 was visualization and the ease of use on inference. I don't believe one could expect much difference using 1 or 0. Having said that, when the missing region is filled with 0, then for those regions completely covering the convolution filter the output will be zero and hence the gradient is zero; while in principle convolving 1s results in some gradient for training.

hi-zhengcheng commented 5 years ago

@knazeri , thanks for your reply. I have another question about discriminator's input. For example, in InpaintingModel model:

outputs = self(images, edges, masks)
...
dis_input_fake = outputs.detach()
dis_fake, _ = self.discriminator(dis_input_fake)
...
gen_input_fake = outputs
gen_fake, _ = self.discriminator(gen_input_fake)

The outputs represents the fake images created by the generator, and it includes all area(mask area and other area). Then the discriminator guides the generator create the same pixel values like in the ground truth images. I don't know if it's better to let discriminator only guide generator create good pixel values in mask area. Or may be create good pixel values in whole area will help to create good pixel values in mask area.

So my question is, to the discriminator, will it be better to make the input as:

discriminator(outputs * mask + images * (1 - mask))

Could you share some opinions?

knazeri commented 5 years ago

@hi-zhengcheng This is what we discussed a lot when first designing the model. In principle, it only makes sense to have the discriminator only discriminate the pixel values in the mask area. However in practice, this makes the discriminator too powerful! One explanation might be that there might be some subtle discontinuity around the edge when merging the output with the ground truth surrounding! The discriminator then learns to pick the edges around the mask border and the learning gets stalled!

We decided to go with the actual output of the generator instead. Also in case you haven't noticed, our generator is also penalized by the perceptual loss on the entire image which makes its output for non-mask areas very similar to the ground truth, and the input to the discriminator is very similar to what you expect!

hi-zhengcheng commented 5 years ago

@knazeri Thanks for your reply. Great work!