yfeng95 / PRNet

Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network (ECCV 2018)
http://openaccess.thecvf.com/content_ECCV_2018/papers/Yao_Feng_Joint_3D_Face_ECCV_2018_paper.pdf
MIT License
4.95k stars 947 forks source link

Training Code #46

Closed ForestWang closed 5 years ago

ForestWang commented 6 years ago

Hi YadiraF: Will the training source code be open? and when?

Thank you very much!

BouOus commented 6 years ago

+1

muyiben commented 6 years ago

+1

MandyMo commented 6 years ago

+1, two month later ? may be three month has past.

developer-mayuan commented 6 years ago

Hi all: Please be patient. The author doesn't owe you anything. In fact, if you really need the training code, you can just email the author in person. He/she should be able to send you some raw code and guide you to finish it. Thanks.

sunjunlishi commented 6 years ago

using depthwiseConv or Inception Layer will be smaller model

developer-mayuan commented 6 years ago

@sunjunlishi Basically you can switch the backbone part of the encoder-decoder network to some light-weight models, such mobilenet v1/v2.

wungemach commented 6 years ago

Has anyone tried to implement the training code for this themselves? I am currently unable to get the model to overfit when training a single example. Just curious if there is something wrong on my end or if there is something subtle going on in the implementation.

developer-mayuan commented 6 years ago

@wungemach What you mean "unable to get the model to overfit when training a single example?"

wungemach commented 6 years ago

I am trying to test my training pipeline by just training the network on a single image/target uv-map pair. If everything is working correctly, then the loss for this one image should go to zero after many iterations, but it doesn’t. It goes to like 10^4. Basically I am saying my first principles implementation can’t even memorize one training example, let alone generalize to unseen inputs.

BouOus commented 6 years ago

Hi, @wungemach can you share what you could do?

developer-mayuan commented 6 years ago

@wungemach I think you must do something wrong in your ground truth generation or training code. Here is my training result. (I cannot share you the training code, sorry about that.)

screenshot from 2018-07-13 10-05-10

screenshot from 2018-07-13 10-04-59

screenshot from 2018-07-13 10-04-44

wungemach commented 6 years ago

@developer-mayuan Those results look great! Could you let me know how you put your training data together? They don't go into a lot of detail in the paper. An issue that I have been having is that the coordinates for the facial meshes in 300W_LP are non-integers and sometimes non-positive, so you can't save them in the RGB channels of an image (I just put them in a numpy array). They also use a submesh of the Basel Face Model which has fewer vertices, but they don't give the full correspondence to get a good uv_map. Am I missing something simple here? Thanks for any help you can offer!

wungemach commented 6 years ago

@developer-mayuan I'm also happy to discuss this over email, if you prefer.

sunjunlishi commented 6 years ago

@developer-mayuan where the training data,it is import.

BouOus commented 6 years ago

Hi @developer-mayuan, Can you tell me please how did you generate the third z component in the BFM_UV (UV_vertices) model for rendering ?
Using the file (https://github.com/anilbas/3DMMasSTN/blob/master/util/BFM_UV.mat) we have 53490 * 2; how do you select the right vertex and how do you generate the z component ?

Thank you

developer-mayuan commented 6 years ago

@BouOus You can follow 3DMMasSTN's resample code to generate the z component. The basic idea is you redo the triangulation and interpolate the new vertexes with the old ones. I haven't written the rendering code for since I only want to detect the landmarks.

wungemach commented 6 years ago

@developer-mayuan how many epochs did your model take to converge, if you don't mind me asking? Did you follow their training advice of cutting the learning rate in half every 5 epochs?

developer-mayuan commented 6 years ago

@wungemach I totally rewrote the training code by myself so I training strategy may be a little different from the paper. I trained the network with 25 epochs and half the learning rate every 50k steps. I didn't try to redo the paper's experiment, so I didn't tune those hyperparameters very well. I think the most important part is the ground truth generation part. If you can generate the UV ground truth data correctly, it shouldn't be very hard to train the network.

wungemach commented 6 years ago

@developer-mayuan right that makes sense. I just feel like I (finally) have the correct ground truth but the model is having trouble converging. Just to be clear: the use of the images as the ground truth is just for visualization right? You can only store integer values in the pixels and I was getting horrible results with that. I am using numpy arrays now.

developer-mayuan commented 6 years ago

@wungemach Basically the answer should be yes, I just use the tf.summary to show the output tensor. I didn't use numpy or another visualization libraries to visualize the result.

marvin521 commented 6 years ago

@developer-mayuan We don't need to train a 3DMMasSTN model on the 300W-LP dataset, right? Just use the UV coordinates? looking forward to your reply. Thanks

developer-mayuan commented 6 years ago

@marvin521 Yes, you are right.

marvin521 commented 6 years ago

@developer-mayuan Thanks for your quickly reply. One more question. could you tell me which function (or script) in 3DMMasSTN that I should use ? Thanks

DamiAlesh commented 6 years ago

@wungemach Did you manage to get the correct ground truth and if you did, please can you tell me what function you used from 3DMMasSTN? I have been trying since this repo came out to generate the smooth color map but have not managed to. Any help will be appreciated. Thanks! Also I can discuss with you over email if you prefer.

yfeng95 commented 6 years ago

Hi, all So sorry for the late release of the training code. Now you can see https://github.com/YadiraF/PRNet#training for details. And, @developer-mayuan , really appreciate your help of answering the questions!

wungemach commented 6 years ago

@DamiAlesh Maybe it's too late to be helpful now, but the main things to use are the BFM_UV.mat file along with trimIndex. The trimIndex file tells you which vertices from the full Basel Face Model are being used in the mesh used by 300W_LP. The lack of documentation makes all of this kind of hard to follow.

wungemach commented 6 years ago

@YadiraF Thanks for uploading the training code! Right now I am running into an issue because 8_generate_posmap_300WLP.py outputs numpy arrays with (possibly) negative coordinates. This creates two issues (1) it makes breaks 8_generate_posmap_300WLP.py when it tries to display the images and (2) it makes the results here unusable as training data as your network ends with a sigmoid activation (which is > 0). Do we need to add a line about translating the data so that the output here is positive, or am I missing something?

yfeng95 commented 6 years ago

@wungemach

  1. Did you show it with opencv? Simply, you can comment it. Or, you can scale the values to 0-1 for correct show.
  2. I considered that normalizing the labels is a general technique that everyone will do.... when you use sigmoid as activation, you need normalize your data to range 0-1. when use tanh, need range -1 to 1. For position map, if all parts of face contained in 2d image, the range should be 0 to image_size. However, I randomly perturb the cropping of face, so the corresponding range is about (0 - image_sizex0.1) to (image_size + image_sizex0.1). Then just normalize it.
    It's also ok that we only divide the value by the imagesize, since most faces of the training data are in image plane. It's really easy to make the model converge(from other'feedback), just try it! ^^
wungemach commented 6 years ago

@YadiraF Thanks for getting back to me. I just wanted to check that after the code that you have posted the only thing to do is the obvious translation and rescaling. I'll give it a try and let you know if I run into any more trouble!

wungemach commented 6 years ago

@YadiraF Sorry to nag, but could you describe this normalization more explicitly? It doesn't seem to me that a naive rescaling and translation is necessarily a good idea. (To be clear, here I am imagining just taking each uv_position_map, adding the minimum and rescaling by the maximum so that the entries lie in [0,256)). In order for applications like projecting the mask onto the image to work correctly, we need that the x,y coordinates are exactly correct to overlap the face in the image and it's not clear to me that this rescaling insures that.

wungemach commented 6 years ago

If anyone is interested, I realized what the issue is. In the training code, the bounding box that they use to crop around a face sometimes misses part of the face. This can result is coordinates outside of the range(0, image_size) and throws an error. You can just enlarge the size of the bounding box to resolve this issue.

jhgfkdj commented 6 years ago

@wungemach Hi could you please tell me witch py file is training code? I just downloaded the author's trainded model and used demo.py to generate my obj with my own pictures. After reading author's training part I still not know how to train, can you tell me how to train my own model with my data?

wungemach commented 6 years ago

@jhgfkdj Sorry for the late reply. The authors didn't publish a full training pipeline (yet, at least), but they have provided all of the tools to build one. Some of the details are outlined in the readme, but the main difficulty is the data preparation. Luckily, the authors DID publish the code on that, which is linked in the readme. The trickiest part is a renderer to smooth the uv_map that comes with the Basel Face Model. Even with that it is still a bit involved. The outline is as follows:

You need to be sure to clone and set up the face3d repository (there are instructions on how to do that in that README there). This requires having downloaded the Basel Face Model and code from a couple of other papers. Then, the linked file from the training description here 8_generate_posmap_300WLP.py will be able to take in a sample .jpg and .mat file from 300W_LP and generate the corresponding ground truth position map for training. You can use this as a starting point for generating all of the ground truth data in 300W_LP. After that, you just build a standard training pipeline that load and batches the data etc.

Good luck!

jhgfkdj commented 6 years ago

@wungemach Pretty thanks for your reply, now I want to know how to genarate training ground truth. As author said, generate_posmap_300WLP was used to generate a position map, but how to generate the mat file which used in this script? Would you please to offer me another communication methods as email or QQ or weixin etc.

wungemach commented 6 years ago

@jhgfkdj The input .mat files are in the 300W_LP dataset, which can be found here.

jhgfkdj commented 6 years ago

@wungemach Thank you for telling me all the time. After some tricky parts as you said, I have finished the step of generating several position maps with 300W-LP dataset. now i wonder if i can use my own data to generate position maps for training? and how to make the mat file correspond with a jpg just as 300W-LP did? Pretty thanks !

wungemach commented 6 years ago

@jhgfkdj Great! Once you can get their script to produce a few outputs, you just need to modify it to run through all of the mat files in 300W_LP and produce a corresponding npy file. A point of confusion for me initially was with the jpgs that you mention. In the paper, then show the target position maps as jpg color gradients. You actually want to generate target npy files, as these don't have the arbitrary constraint that the entries be integers. (Technically, this might not matter depending on how you are saving your images, but I think the best thing to do is save numpy arrays to avoid confusion.) This will give you a collection of training pairs which you then pull into a standard training pipeline to train the model. I'm sure you can find good examples of that if you haven't done it before.

jhgfkdj commented 6 years ago

@wungemach Thanks for patiently explaining for me. What I confused all is based on I haven't really understood what the training input data is. For example, if I want to train a mtcnn or ssd, the training data must be the jpg/label(txt) pair, then we calculate the loss between ground truth in label and the predicted value through the net. you said the jpg/npy as a collection of training pairs pull into a training pipline , but i dont know what to do with the numpy file after i load it. can you please give me some suggestion or assured examples? Sorry for asking the novice questions cotinuously.

jhgfkdj commented 6 years ago

@wungemach Another confusion is how to make the mat file as 300W_LP did which I mentioned before. I watched several papers and they all used this dataset. what's stored in the mat file? So if I want to use my own jpgs to generate posmap for training, how to make the mat files?

wungemach commented 6 years ago

@jhgfkdj

(1) The jpg/npy pairs from 300W_LP form labeled training data in the following way: You take your input jpg, load it as an npy array, and feed it through your network to get a predicted output position map. You then compare this to the label npy position map by using the loss function they describe in their paper, which is just a weighted mean-squared error.

(2) The label data in 300W_LP is supposed to describe a 3D mesh corresponding to the position and shape of the face in the corresponding jpg. There are different ways of specifying a 3D mesh, the naive way is just to store the xyz coordinates for each of the vertices in the mesh, and remember how the edges and faces connect them. This is the format that we need the label in for PRNet. A more sophisticated and sometimes more useful way to store this information is with what is called a 3D Morphable Model. You can think of a morphable model as a lower dimensional encoding of the xyz coordinates of the vertices, similar to PCA. The script that we are running to generate training labels from the face3d repository is converting from morphable model coordinates to the xyz coordinates that we need.

(3) I don't really have a good answer for how you would go about creating your own examples for training. I would guess that the labeling of the examples in 300W was done partially automatically but also involved some work by hand. The full 300W_LP was then synthesized from this dataset. It's possible that there are other deep neural networks that could generate labels for you (but they just are much slower and more complicated than PRNet, which is it's appeal) but that seems like poor practice to me.

marvin521 commented 6 years ago

Does anyone know that how to obtained the BFM_UV.mat in the method of 3DMMasSTN? I want know the detailed way to obtain the BFM_UV.mat. Any reply is appreciated.

wungemach commented 6 years ago

@marvin521 You should be able to download it from here. Hopefully that will work!

marvin521 commented 6 years ago

@wungemach yes, I can find that file. You are every kindly. But I want to know how to obtain or derive this mat file. I want to know the detail procedure or the method. 3DMMasSTN does not give the detailed algorithm.

wungemach commented 6 years ago

@marvin521 ah - sorry. They say that these coordinates are found by computing the Tutte embedding with conformal Laplacian weights for this graph, which I think is just a linear algebra computation. If I recall correctly, they do this in their code.

marvin521 commented 6 years ago

@wungemach I try to find the procedure or code snippet in which the BFM_UV.mat is saved. However, there is no related code snippet. I curious to this mat file. If I know how to map the UV coordinates, then I can train a model using my own 3D mesh data

wungemach commented 6 years ago

@marvin521 right. They do have code for computing the Laplacian. I think you just use the entries of that Laplacian matrix to solve the system outlined here at the beginning of section three, for example. This system will have a unique solution, so you just have to invert the coefficient matrix. I thought they did this explicitly in their code, but I can't seem to find it either.

The only difficulty I see showing up is that most of the theorems about Tutte embeddings seem to be about three-regular graphs, and this graph isn't three regular. It might be that the linear algebra just works out in this case to get a solution.

marvin521 commented 6 years ago

@wungemach Many thanks. I will read that paper (Orbifold Tutte Embeddings) . Your reply is very helpful.

speculaas commented 5 years ago

Dear All, Thanks for your help!

I am studying the paper and this github. A quick question before finished reading everything: is the definition (neural network structure ; the decoder encoder) available?

BR, JimmyYS

speculaas commented 5 years ago

Dear All,

Thanks for your help! I find the network definition in PRNet\predictor.py self.network = resfcn256()

BR, Jimmy

BigDataHa commented 5 years ago

@developer-mayuan Hi, I wrote the training code, but it doesn't work. could I ask you some questions?