microsoft / Deep3DFaceReconstruction

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019)
MIT License
2.21k stars 446 forks source link

[Reconstruction Result] wrong direction #87

Closed tomguluson92 closed 3 years ago

tomguluson92 commented 4 years ago

Dear Deng, I have implement a pytorch version of your repo (including tf_mesh_renderer). But during training process (I only use photonmeteric loss, perceptual loss and coefficient regularization loss), I got the reconstructed faces with wrong direction. Any ideas on solving this? Really appreciated for your response! image

YuDeng commented 4 years ago

Hi, in our experiment, we find that it is necessary to use landmark loss to constrain the face pose. Otherwise the training process might fail like what you have met. The reason lies in the inherent property of tf_mesh_renderer whose backprop gradients may not give correct direction for pose alignment. By contrast, the photometric loss proposed by MoFA (Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction) are sensitive to pose misalignment which considers the vertex-wise photo error instead of pixel-wise error. Therefore, if you decide not to use landmark loss as constrain, you can add a MoFA-like photometric loss to help correct the pose. On the other hand, the MoFA's photo loss are very sensitive to boundaries between different color regions (for example bangs), and are likely to deform the 3D shape to fit the boundaries, which may result in irregular face shapes. So you should carefully balance these two kinds of photo loss while training to get the best performance.

tomguluson92 commented 4 years ago

Thanks a lot! I found it's may caused by I did not use a pretrianed resnet50 at first.

Does this kind of results seems reasonable? By the way, which normalization technique do you use?

normalize the data to 0 to 1 or -1 to +1 ? I am using 0 to 1 now, but I just wonder whether -1 to +1 is the choice you made?

normalize to [0, 1]

(x - xmin) / (xmax - xmin) image

x.clamp(0, 255) / 255. image

YuDeng commented 4 years ago

Hi, we do not normalize input images to the network so the range for color is 0-255, in BGR order. However, for loss computation, we normalize both input images and rendered images to 0-1.

tomguluson92 commented 4 years ago

Thank you very much!

tomguluson92 commented 4 years ago

Dear Dr. Deng, I just wonder if the result is reasonable (at the first stage of running, I only use 100 pics as dataset for training). Besides, could you mind give me the specific meaning of R(x) in Equation 6? Would that affects big in the final output? The loss image

The result image

YuDeng commented 4 years ago

R(x) is a pre-defined region on the face mesh to compute texture variance loss, which aims to eliminate lighting effect in the predicted face texture(albedo). I think you should check if the color channel order is correct. If you use input image in BGR order, you have to make sure that the texture basis (for example in shape 80xNx3) have the same order in color channel. Otherwise you turn out to use R-channel texture basis to reconstruct B-channel of input image, and this will give you bad results.

tomguluson92 commented 4 years ago

Thank you so much, I will do that and examine it.

tomguluson92 commented 4 years ago

Hi @YuDeng, does R(x) in computing texture variance loss is the skinmask in BFM_model_front.mat? Since I found I always get a mean face, I think maybe Ltex is essential for a pleased result?

image

YuDeng commented 4 years ago

Yes. The skinmask is R(x), the skin region. Actually without L_tex, the network can still give reasonable result. So your mean face result might not come from the absence of L_tex. I suggest to add a landmark loss in your training process to see if the result is better or not.

tomguluson92 commented 4 years ago

Thank you very much, the problem is that I found is to initialize the FC layer of Resnet50 to zero (both weight and bias). Then it works well.

But I don't know why initialization matters so much in this experiment, I didn't get the point ~

YuDeng commented 4 years ago

Initializing the last FC to zero means to set the face in canonical view with neutral expression. This makes the initial prediction of the network closer to the position of faces in input images and helps the network to get rid of bad local minimum.

v-prgmr commented 3 years ago

@tomguluson92 , is your Pytorch port available?