Closed tomguluson92 closed 3 years ago
Hi, in our experiment, we find that it is necessary to use landmark loss to constrain the face pose. Otherwise the training process might fail like what you have met. The reason lies in the inherent property of tf_mesh_renderer whose backprop gradients may not give correct direction for pose alignment. By contrast, the photometric loss proposed by MoFA (Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction) are sensitive to pose misalignment which considers the vertex-wise photo error instead of pixel-wise error. Therefore, if you decide not to use landmark loss as constrain, you can add a MoFA-like photometric loss to help correct the pose. On the other hand, the MoFA's photo loss are very sensitive to boundaries between different color regions (for example bangs), and are likely to deform the 3D shape to fit the boundaries, which may result in irregular face shapes. So you should carefully balance these two kinds of photo loss while training to get the best performance.
Thanks a lot! I found it's may caused by I did not use a pretrianed resnet50 at first.
Does this kind of results seems reasonable? By the way, which normalization technique do you use?
normalize the data to 0 to 1
or -1 to +1
? I am using 0 to 1
now, but I just wonder whether -1 to +1
is the choice you made?
(x - xmin) / (xmax - xmin)
x.clamp(0, 255) / 255.
Hi, we do not normalize input images to the network so the range for color is 0-255, in BGR order. However, for loss computation, we normalize both input images and rendered images to 0-1.
Thank you very much!
Dear Dr. Deng,
I just wonder if the result is reasonable (at the first stage of running, I only use 100 pics as dataset for training).
Besides, could you mind give me the specific meaning of R(x)
in Equation 6? Would that affects big in the final output?
The loss
The result
R(x) is a pre-defined region on the face mesh to compute texture variance loss, which aims to eliminate lighting effect in the predicted face texture(albedo). I think you should check if the color channel order is correct. If you use input image in BGR order, you have to make sure that the texture basis (for example in shape 80xNx3) have the same order in color channel. Otherwise you turn out to use R-channel texture basis to reconstruct B-channel of input image, and this will give you bad results.
Thank you so much, I will do that and examine it.
Hi @YuDeng,
does R(x) in computing texture variance loss is the skinmask
in BFM_model_front.mat? Since I found I always get a mean face, I think maybe Ltex is essential for a pleased result?
Yes. The skinmask is R(x), the skin region. Actually without L_tex, the network can still give reasonable result. So your mean face result might not come from the absence of L_tex. I suggest to add a landmark loss in your training process to see if the result is better or not.
Thank you very much, the problem is that I found is to initialize the FC layer of Resnet50 to zero (both weight
and bias
). Then it works well.
But I don't know why initialization matters so much in this experiment, I didn't get the point ~
Initializing the last FC to zero means to set the face in canonical view with neutral expression. This makes the initial prediction of the network closer to the position of faces in input images and helps the network to get rid of bad local minimum.
@tomguluson92 , is your Pytorch port available?
Dear Deng, I have implement a pytorch version of your repo (including
tf_mesh_renderer
). But during training process (I only use photonmeteric loss, perceptual loss and coefficient regularization loss), I got the reconstructed faces with wrong direction. Any ideas on solving this? Really appreciated for your response!