open-mmlab / mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.
https://mmpose.readthedocs.io/en/latest/
Apache License 2.0
5.91k stars 1.26k forks source link

A question about training on 3D keypoints datasets #1325

Open Indigo6 opened 2 years ago

Indigo6 commented 2 years ago

I'm new to 3D keypoints detection. When preparing the Human3.6M dataset, I find that the structure of the preprocessed data in mmpose is different with that of PoseNet or RLE. Could someone please tell me what's the difference and is there any way to transfer between each other(since PoseNet provides parsed data)?

mmpose:

`── data ├── h36m
├── annotation_body3d
`── images ├── S1 | ├── S1_Directions_1.54138969 | | ├── S1_Directions_1.54138969_00001.jpg | | ├── S1_Directions_1.54138969_00002.jpg | | ├── ... | ├── ...\

PoseNet:

|-- h36m `-- |-- annotations | |-- Sample_trainmin_train_Human36M_protocol_2.json | `-- Sample_64_test_Human36M_protocol_2.json `-- images | |-- s_01_act_02_subact_01_ca_01

Indigo6 commented 2 years ago

Also, the MPI-INF-3DHP dataset preparation is listed in mesh recovery but evaluation in 3D keypoints. I'm confused.

  1. Can I evaluate pretrained models on MPI-INF-3DHP of keypoints detection?
  2. If yes, is the folder structure same as the that in mesh recovery dataset preparation?
ly015 commented 2 years ago

Human3.6M dataset has been used for 2 different tasks in MMPose, namely 3D keypoint detection and mesh recovery, with different annotation structures and preparation processes. For 3D keypoint detection, the data is parsed from the raw downloaded from the official website with this script. Please refer to the docs for details. For 3D mesh, please refer to here for data preparation.

MPI-INF-3DHP dataset is only for 3D keypoint detection in MMPose. The data preparation guide is also for this task but is wrongly placed, which we will fix soon. The data parsing script is here.

Also, please note that algorithms and features related to 3D mesh recovery in MMPose are being deprecated and no longer maintained. Please check out our new codebase MMHuman3D for human pose and shape recovery with parametric models.

Indigo6 commented 2 years ago

Thank you for your reply! Now I know how to prepare MPI-INF-3DHP dataset for keypoints detection.

About Human3.6M dataset, I know there're two tasks and two folder structures respectively. My question is, for keypoints detection, why there are two preprocessing methods and results? One from anibali/h36m-fetch, is used by MMPose here. The other CHUNYUWANG/H36M-Toolbox, is built on top of the former and used by POSENet and RLE. I used to work on 2D keypoints, and am new to 3D.

Indigo6 commented 2 years ago

Thank you for your reply! Now I know how to prepare MPI-INF-3DHP dataset for keypoints detection.

About Human3.6M dataset, I know there're two tasks and two folder structures respectively. My question is, for keypoints detection, why there are two preprocessing methods and results? One from anibali/h36m-fetch, is used by MMPose here. The other CHUNYUWANG/H36M-Toolbox, is built on top of the former and used by POSENet and RLE. I used to work on 2D keypoints, and am new to 3D.

Human3.6M dataset has been used for 2 different tasks in MMPose, namely 3D keypoint detection and mesh recovery, with different annotation structures and preparation processes. For 3D keypoint detection, the data is parsed from the raw downloaded from the official website with this script. Please refer to the docs for details. For 3D mesh, please refer to here for data preparation.

MPI-INF-3DHP dataset is only for 3D keypoint detection in MMPose. The data preparation guide is also for this task but is wrongly placed, which we will fix soon. The data parsing script is here.

Also, please note that algorithms and features related to 3D mesh recovery in MMPose are being deprecated and no longer maintained. Please check out our new codebase MMHuman3D for human pose and shape recovery with parametric models.

I tried the preprocess_h36m script in MMPose to get the structure, fps10 and fps50, as claimed in the documentation. The final data takes more than 322G...... The processed data in RLE just takes about 100G. Has anyone else tried the preprocess_h36m script? I wonder about the difference, and why not adapt CHUNYUWANG/H36M-Toolbox preprocessing?

ly015 commented 2 years ago

I am not sure why there is such a large difference between the data sizes. Maybe it's because of the video2image approach? We use OpenCV while CHUNYUWANG/H36M-Toolbox directly uses FFmpeg tools.

ly015 commented 2 years ago

Human3.6M dataset has been used for 2 different tasks in MMPose, namely 3D keypoint detection and mesh recovery, with different annotation structures and preparation processes. For 3D keypoint detection, the data is parsed from the raw downloaded from the official website with this script. Please refer to the docs for details. For 3D mesh, please refer to here for data preparation.

MPI-INF-3DHP dataset is only for 3D keypoint detection in MMPose. The data preparation guide is also for this task but is wrongly placed, which we will fix soon. The data parsing script is here.

Also, please note that algorithms and features related to 3D mesh recovery in MMPose are being deprecated and no longer maintained. Please check out our new codebase MMHuman3D for human pose and shape recovery with parametric models.

Corrections on the mpi-inf-3dhp dataset: it's also used for both mesh and 3d keypoint, while the data preprocessing guide for 3d keypoint task is missing from the docs. We will add it soon.

Indigo6 commented 2 years ago

Human3.6M dataset has been used for 2 different tasks in MMPose, namely 3D keypoint detection and mesh recovery, with different annotation structures and preparation processes. For 3D keypoint detection, the data is parsed from the raw downloaded from the official website with this script. Please refer to the docs for details. For 3D mesh, please refer to here for data preparation. MPI-INF-3DHP dataset is only for 3D keypoint detection in MMPose. The data preparation guide is also for this task but is wrongly placed, which we will fix soon. The data parsing script is here. Also, please note that algorithms and features related to 3D mesh recovery in MMPose are being deprecated and no longer maintained. Please check out our new codebase MMHuman3D for human pose and shape recovery with parametric models.

Corrections on the mpi-inf-3dhp dataset: it's also used for both mesh and 3d keypoint, while the data preprocessing guide for 3d keypoint task is missing from the docs. We will add it soon.

Thank you for your reply and excellent project!

Indigo6 commented 2 years ago

I am not sure why there is such a large difference between the data sizes. Maybe it's because of the video2image approach? We use OpenCV while CHUNYUWANG/H36M-Toolbox directly uses FFmpeg tools.

Thanks for the clue, I'll look into the preprocess script and try to find the difference(s).

Indigo6 commented 2 years ago

I am not sure why there is such a large difference between the data sizes. Maybe it's because of the video2image approach? We use OpenCV while CHUNYUWANG/H36M-Toolbox directly uses FFmpeg tools.

You are right, the image extracted by FFmpeg with qscale:v set to 3 is much smaller than that extracted by OpenCV. With the same number of extracted frames, the total preprocessed data size of h36m-fetch is 3 times larger than H36M-Toolbox. For example, 11980 frames for S1_Act2, 2.2G for h36m-fetch while 760m for H36M-Toolbox. I wonder whether and how much the image extraction method influences the result?

ly015 commented 2 years ago

I am not sure why there is such a large difference between the data sizes. Maybe it's because of the video2image approach? We use OpenCV while CHUNYUWANG/H36M-Toolbox directly uses FFmpeg tools.

You are right, the image extracted by FFmpeg with qscale:v set to 3 is much smaller than that extracted by OpenCV. With the same number of extracted frames, the total preprocessed data size of h36m-fetch is 3 times larger than H36M-Toolbox. For example, 11980 frames for S1_Act2, 2.2G for h36m-fetch while 760m for H36M-Toolbox. I wonder whether and how much the image extraction method influences the result?

So far the Human3.6M dataset is only used for simplebaseline3D and videopose3D in MMPose, which are both 2d-to-3d lifting algorithms. So the images are not actually used and we don't know how much it would affect the results of some RGB-based methods.

ly015 commented 2 years ago

@Indigo6 BTW, would you be interested in an internship in OpenMMLab? If so please reach me via liyining0712@gmail.com :)

Indigo6 commented 2 years ago

I am not sure why there is such a large difference between the data sizes. Maybe it's because of the video2image approach? We use OpenCV while CHUNYUWANG/H36M-Toolbox directly uses FFmpeg tools.

You are right, the image extracted by FFmpeg with qscale:v set to 3 is much smaller than that extracted by OpenCV. With the same number of extracted frames, the total preprocessed data size of h36m-fetch is 3 times larger than H36M-Toolbox. For example, 11980 frames for S1_Act2, 2.2G for h36m-fetch while 760m for H36M-Toolbox. I wonder whether and how much the image extraction method influences the result?

So far the Human3.6M dataset is only used for simplebaseline3D and videopose3D in MMPose, which are both 2d-to-3d lifting algorithms. So the images are not actually used and we don't know how much it would affect the results of some RGB-based methods.

Ok, I‘ll try to implement some methods not based on 2D-to-3D Lifting and test the difference then.

Indigo6 commented 2 years ago

@Indigo6 BTW, would you be interested in an internship in OpenMMLab? If so please reach me via liyining0712@gmail.com :)

Thank you sincerely for your invitation! I am quite interested in an internship in OpenMMLab and really appreciate the opportunity. However, sadly( , my mentor does not allow any internship.

Indigo6 commented 2 years ago

I found there were PRs on the h36m one-stage dataset but they are closed: https://github.com/open-mmlab/mmpose/pull/868 and https://github.com/open-mmlab/mmpose/pull/975. May I ask the reason?

ly015 commented 2 years ago

There were two reasons: 1) The developer was an intern in the mmpose team and he left for an exchange opportunity before these PRs were ready to merge; 2) Coarse-to-fine is a rather old work (CVPR 2017) and we are reconsidering the choice of algorithms in this category to support in mmpose.

Indigo6 commented 2 years ago

I'd like to help support direct 3d pose methods since my mentor assigned a national project on this to me, but I'm totally new to 3d pose dataset and transform. What's your plan for direct 3d pose and how can I help? Can we support direct 3d pose with a simple regression head first?

ly015 commented 2 years ago

That would be great and thank you very much! @jin-s13 Could you please give some suggestions here?

jin-s13 commented 2 years ago

@Indigo6 For now, we do not have enough manpower to support all these awesome algorithms. Your contribution is really helpful! We appreciate it very much.

Can we support direct 3d pose with a simple regression head first? Yes, I think it is okay. We may start from this simple baseline. One minor concern is that it may not work very well.

If you need a better model and still interested, it is suggested to also consider implementing this.

Indigo6 commented 2 years ago

Integral and variations of soft-argmax are ok to me. My major concern is how to implement Dataset object and pipelines. Do you have any suggestions on building them from scratch or adapting them from Integral/RLE?