switchablenorms / CelebAMask-HQ

A large-scale face dataset for face parsing, recognition, generation and editing.
2.05k stars 343 forks source link

Question about MaskGAN #64

Open 2502128021 opened 3 years ago

2502128021 commented 3 years ago

I reproduce the MaskGAN in tensorflow,and train it, but the result is a little bit weired, espically the hair region and face details like iris color, freckle, ..., etc. Have U met these problems, in your training stage, what is the wrong U suppose in my project?

steven413d commented 3 years ago

Hi, the facial details sometimes may not keep due to the influence of the data distribution. There are two probable solutions as following:

  1. Image composition: blend the region without editing and the edited region.
  2. Add local region loss (separate local region by masks in the training stage) on the eye part or the skin part.
2502128021 commented 3 years ago

Here is the testing result in my training stage, I‘ll try your advice! otherwise, I've tested your pretrained model, when I give the image and it's corresponding mask as input, your model can not recover the origin image, it may be improved by your advice 1, but as the missing details is almost everywhere, U can not replace the artifacts completely, that is saying, your model can not preserve the id information well, thus we can not edit the real image, what about your advice for improving? or maybe I used it in a wrong manner? train_TC_00095

steven413d commented 3 years ago

I think you need to check the label IDs of the parsing label. Maybe the label IDs used in training are different from testing. The problem may happen in the data loader.

2502128021 commented 3 years ago

the pic I show is the result in my testing set, my training set and testing set are split from dataset CelebAMask-HQ. They are made in following steps: 1.run g_mask.py to make the one-channel mask which value is in range of [0, 18]; 2.make 19-channels mask according the value in one-channel mask, the value of each channel is in range of [0, 1]; 3.split the dataset into training set and testing set, as a matter of face, I have only 36 testing pics shown in the result pic above. as u can see, hair region is quite weired, and face details are missing. so they are maybe not different, but I'll check my dataloader again since that is processed in TF tensor. In addition, my lsgan loss seems not converge, but it make sense since the result is quite different from real img. Have u met these problems in your training process? I'll try your 2th advice first in hair region to see whether the result will get improved. thx!

steven413d commented 3 years ago

In my experience, the blurry hair problem is caused by the process of normalizing (to the range of [0, 1]) and denormalizing (to the range of [0, 255]) the parsing label. The label IDs would change when using the wrong method on denormalizing. You can follow the method used in face_parsing.

2502128021 commented 3 years ago

I‘ve found the cause for the bad result: the GAN loss’ weight is too small to work, when I increase the weight of GAN, the result become sharp and realistic. train_TC_00018 U've ever suggested "Add local region loss (separate local region by masks in the training stage) on the eye part or the skin part." , how to add? I'am doubt that direct L1 loss of resultsource_mask and targettarget_mask may introduce incorrect spatial mapping relationship since the source_mask will change in the inference. Furthermore, I think it is evitable to lose some ID information in your algorithm, the probable cause in my opinion is the global average pool in your Style Feature Transfer layer, It obviously cause a great loss of origin information. Do U think so? or U have other methods to keep the ID of origin pic?(keep id is an essential requirement in face manipulation in my opinion)

steven413d commented 3 years ago
  1. Actually, I haven't tried to add local loss (I don't have training resources recently). It is similar to the way used in BeautyGAN.
  2. Yes, the usage of GAP may lose some information. In the case of 256 x 256, I think it is possible to extend a stronger style encoder without GAP or an ID loss.