vitoralbiero / img2pose

The official PyTorch implementation of img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation - CVPR 2021
Other
588 stars 109 forks source link

Question about 300W-LP labels acquirements #61

Closed FunkyKoki closed 2 years ago

FunkyKoki commented 2 years ago

Thanks for your work and help. Now I have got a lite-weight model using MobileNetV3-small as backbone. This lite-weight model can achieve the same pose evaluation performance on AFLW2000 as your model (both without fine-tuning on 300W-LP).

Now I am focused on fine-tuning. In your paper, you said:

Training pose rotation labels are obtained by converting the 300W-LP ground-truth Euler angles to rotation vectors, and pose translation labels are created using the ground-truth landmarks, using standard means.

I open this issue to confirm several things, and I will be very grateful if you can help.

Here are my questions:

  1. How did you define the face bounding box for each image in 300W-LP since in some images there can be two or more faces detected? Did you just use one bounding box, which has landmark annotations? How did you get the bounding box? Did you use a face detector, like InsightFace?
  2. Since the only one face in each image of 300W-LP are annotated with 68 points, can I directly use the labeled landmarks to make the JSON files, and choose to use self.threed_68_points in the code here to generate the lmdb file?

That's all. Thank you so much.

eugeneYz commented 2 years ago

have u ever found the prediction result x,y,z an error (especially z when i use my webcam to detect)?

FunkyKoki commented 2 years ago

have u ever found the prediction result x,y,z an error (especially z when i use my webcam to detect)?

@eugeneYz

What is the model you are using? Are you using the fine-tuned model?

By the way, the question you asked is not related to the topic at all.

You can ask @vitoralbiero in a new issue.

FunkyKoki commented 2 years ago

Hi, @vitoralbiero , why don't you release the 300W-LP annotations? This makes me feel strange. Is this because of the copyright?

I use InsightFace to detect the faces and the corresponding landmarks (3D, 68 points) for each image in 300W-LP, only one face closest to the center of the image would be saved as the ground truth (with .json format). Then I use the convert_json_list_to_lmdb.py to convert them into a lmdb file and use it for training.

But the evaluation performance on AFLW2000 is not boosted at all.

vitoralbiero commented 2 years ago

Hello @FunkyKoki,

I don't have time right now to release the annotations. But we may do so in the future.

300W-LP comes with Euler angles and landmarks, so we converted the provided Euler angles to get rotation vectors, and used the landmarks to get translation vectors.

To fine-tune on 300W-LP, please follow section 4.1 of our paper, which describes the training and fine-tuning steps.

Hope this helps.

FunkyKoki commented 2 years ago

Hi there, thanks for your answer, but I still cannot figure out how did you define the ground truth face bounding box for each image?

vitoralbiero commented 2 years ago

We used the provided landmarks to get a bounding box. You can use this function to get a bbox using landmarks.