About the results of CelebA: Some repaired images are similar to the original one

lsongx commented 5 years ago

Hello @knazeri , thanks for your code!

In your paper, some inpainting results are shown. They are really cool, but the results in the pictures have different style. That is, some results are nearly same with the original pic while others not (but still visually perfect). For example, in Fig 15, the following results: In all of the three images, eyes (or mouth) are missing. For the first image, it is recovered with a synthesized person, while for the 2-3 lines, the recovered images are actually the same as the original image. My questions are: Q1: Are the images selected from a test (validation) set, which is not used for training? Or only mask is seperated into training/val/test? Q2: Are there any overlapping person between the train/val/test set? Intuitively, the results in the above 2-3 lines is due to the existing of the same person in the train/test set.

Thanks!

knazeri commented 5 years ago

@LcDog With CelebA dataset, we used the standard train/val/test sets. That means there was no overlap between training and test sets! That is, of course, if the standard train/val/test sets are correctly separated and to the best of my knowledge, the CelebA dataset does not contain duplicate images!

As to why images look very similar, one explanation is that majority of the images in CelebA dataset are frontal human faces centered in the image smiling! In a sense, CelebA dataset is not a very challenging dataset to generalize! Many other papers have published similar results. Our focus here however, is to find the structure of the image first (via edges) and use that to guid the inpainting procedure!

xskyz commented 5 years ago

@knazeri Hello， I want to ask how can I get the edge data set?multiply the image and the mask?

lsongx commented 5 years ago

@knazeri Thanks for your kind and quick reply! This work and the results are really cool!

It is true that in CelebA the images are all frontal human faces centered in the image smiling, which is highly structured.

But I still do not get the point of "generalize". In machine learning problems, generalization usually refers to the ability of an algorithm to be effective across a range of inputs and applications (I just google for the definition of generalization and found this).

Thus, I guess that generalization in the context of inpainting means that the results are visually appealing, rather than being exactly the same. More importantly, as human beings, we are not able to fill the missing area with exact same structures, e.g. the eyes. What we can do is simply giving a perfect visually natural image, not the original image, just as the first line that I cropped from your paper. However, the similar image pairs are very common in your results (or perhaps in other related works, as you suggested). Why does this happen for the inpainting algorithms?

Any comments would be highly appreciated!!

lsongx commented 5 years ago

@xskyz I think maybe you should refer to these lines: https://github.com/knazeri/edge-connect/blob/826f2b8d29d9dfcdbcc501bff36c0cd23b2cb30a/src/dataset.py#L102

Or you could open a new issue, since your problem is different from this issue.

knazeri commented 5 years ago

@LcDog

I guess that generalization in the context of inpainting means that the results are visually appealing

That's true, and that's what I meant! I don't believe the intent of any supervised machine learning algorithm would be to reproduce the results exactly as its training set, but rather find a mapping from observations to the preferred outputs with the goal that this mapping is a good representation of the underlying distribution. The basic assumption here is that 1) there exists a true distribution in the data, and 2) the training set is a good representative of the distribution.

Now, back to our discussion, the CelebA dataset is a well-structured dataset and the training set truly represents the data distribution. In a sense, finding a mapping from input to the output close to the 'true' mapping is very much reasonable considering a high capacity model like deep convolutional networks! However, I don't believe by any means, that our model is replicating the input! The images in our paper are very similar to the ground truth, but there are subtle differences between any two images; for example the eyes color and the shape of the eyebrows are most of the times different, the wrinkles, freckle, and moles are almost nonexistant and the results may deteriorate a little for corner distributions like side faces. Besides our model (and others) still struggle to effectively model more challenging datasets like Places!

Beyonds, it is still a good and open question why generative models are able to model very complex distributions very effectively. Honestly, having worked with neural networks for some time, I should say, these models never cease to surprise me with their unique ability to process an exceptionally wide range of tasks!

lsongx commented 5 years ago

@knazeri Thanks for your helpful comments!

While you kindly help me, I have tested more images with your released code and model (thanks again for this!). Now I have some more observations and new thoughts on the existence of similar pairs.

First of all, as you said,

The images in our paper are very similar to the ground truth, but there are subtle differences between any two images...

I totally agree after running more experiments by myself. However, some distributions (such as shapes) of facial features, e.g. noses or mouths, are frequently recovered just like the original images. This is surprising and hard to understand at the first glance. After roughly going through the images in CelebA, now I guess that the reasons are:

The CelebA images are collected from celebrities, thus the images are usually captured from good-looking persons;
There maybe some underlying distributions of being good-looking;
In other words, there are some underlying distributions of the facial features and some connections with other facial parts;
Then with known parts, the inpainting model wants to recover the image not only being visually natural, but also being good-looking;
Therefore, the conditions of being good-looking usually determine the distributions of other facial components. As a result, the recovered images are similar to the original one.

My analyses may seem nonesense to the experts in the field of inpainting or generative models, but I believe the phenomenon of generating very similar results needs more explanation as it is really hard to have an intuitive idea.

Thanks for your work and help, Dr. Nazeri. I think my questions are solved in this issue. And I am wandering whether I need to close the issue, since leaving it open may attract some experts to give more reasonable explanations.

knazeri commented 5 years ago

@LcDog Selection bias is a very prevalent problem in machine learning, which can lead to bias against minorities and/or in decision algorithms, Times had a good article about this a while ago. There's also an illustration by Google as to why and how we expose our own biases into machines!

Truth is, none of these datasets are universally applicable to every problem. In reality, the population distribution is multimodal, but a dataset is merely an observation of the true population and when it cannot cover every mode in the true distribution, the machine learning algorithm picks the observation modes as the majority of the data distribution and imposes the bias throughout the model!

Let's keep the issue open.

anshen666 commented 4 years ago

你好，我最近也在跑这个代码。可以加你交流一下吗？我的微信：loveanshen 我的QQ：519838354 我的邮箱：519838354@qq.com 非常期待你百忙中的回复

knazeri / edge-connect

About the results of CelebA: Some repaired images are similar to the original one #45