Closed rainsoulsrx closed 2 years ago
Hi rainsoulsrx, please refer to Sec 3.2 in the paper.
The e4e encoder can map a real image into its latent code w+, w+ consists of a series of latent vectors w with low variance, each close to the distribution of the W latent space of StyleGAN. In order to apply our method to latent codes of real images using e4e, we sample two w+ latent code set: D0 (in W latent space) and Dnoise (add noise to codes in each layer, close to the distribution of the W latent space).
Thanks for you quick reply. You mean you train on both D0 and Dnoise separately, in order to make the model robust in both stylegan fake images and real images?
And I run the code, and found some failures like follows:
Is this right?
Thanks for you quick reply. You mean you train on both D0 and Dnoise separately, in order to make the model robust in both stylegan fake images and real images?
Yes, you are right.
And I run the code, and found some failures like follows: Is this right?
The failures are acceptable. During training, some images' hair can not be removed completely (about 5%) by the hair separation boundary, that's also the reason why we use the paired latent codes to train a hair mapper (the failures won't affect the training of hair mapper).
There are also some images in my own training dataset that still have hair, they didn't affect the training of the hair mapper.
There are also some images in my own training dataset that still have hair, they didn't affect the training of the hair mapper.
Got it!! Thank you for your kind and detailed reply~~
Hi, I have another question. After you get the edited image, you optimize the result using L1 and VGG loss, when using L1 loss, you multi hair mask, I think it is reasonable. But when calculating VGG loss, you do not multi the mask, why?
Hi, I have another question. After you get the edited image, you optimize the result using L1 and VGG loss, when using L1 loss, you multi hair mask, I think it is reasonable. But when calculating VGG loss, you do not multi the mask, why?
We use vgg loss (perceptual loss) to penalize the high-level (geometric) feature difference between x and x_rec, we want the x_rec to maintain the geometric feature of the bald head and face in x (the edited image). Please refer to Sec 3.4 in Coarse-to-Fine: Facial Structure Editing of Portrait Images via Latent Space Classifications, we explained the details of diffusion.
Oh oh oh, I git it, thanks!!!
Thank you for your excellent work. I wonder what the D_0 and D_noise for, why you generate two part of random images?