mks0601 / 3DMPPE_POSENET_RELEASE

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019
MIT License
807 stars 147 forks source link

Problems with Reproduced Papers #127

Closed yunshangyue71 closed 1 year ago

yunshangyue71 commented 1 year ago

Thanks for your open source project, When I reproduced your paper, Accuracy is consistently low on Human3.6M dataset.I use Hrnet32 as my backbone,without use any extra 2d datasets。MPJPE is 200-300mm。 You described in the paper that MPII was used when training Human3.6M , but in the readme of the project, it was described that only Human3.6M was used and the effect achieved was within 50mm.

Can this gap give you an idea?

mks0601 commented 1 year ago

I used H36M and MPI for the training. Could you point me to the README section that states only H36M is used?

yunshangyue71 commented 1 year ago

in 《Human3.6M dataset using protocol 2》。 You said the ones with used additional data sets. But your paper is not marked with 。 My native language is Chinese, if there is a problem with the tone or emotional expression, please hit Google Translate。I am asking for advice with an open mind

mks0601 commented 1 year ago

No problem with your English and sorry for confusion :) The extra dataset means more data in addition to H36M and MPI. As far as I remember, methods with * used some synthetic datasets.

yunshangyue71 commented 1 year ago

Can you roughly estimate if your project doesn't use mpii. Just using Human3.6M, how much can MPJPE get?so that I can use as a reference.

I would like to discuss with you how to obtain Zroot. You use the internal parameters of the camera and the area of ​​the box to calculate k as an input. I think this is not very flexible, because the body detector of the project may change during actual use. For example, the limitation of computing power. Can the network also learn the area of ​​this box? This decouples the preceding detectors.

mks0601 commented 1 year ago
  1. Without MPI, I remember the performance was really bad. The network is easily overfitted to monotonous appearances of H36M images. This is applicable to many other methods as I remember.

  2. I can't clearly get your point. Could you elaborate it more?

yunshangyue71 commented 1 year ago
  1. Let me give you an example, if I use exactly like your project, it works great. If my computing power is tight, I want to replace the detector YOLOV4 with a lightweight detector called AnyDetector. The box detected by AnyDetector is 15% larger than YOLOV4. Will this affect your results?
  2. When I was visualizing the dataset, I found some problems. subject 9_act 5-10-13, has some error.
mks0601 commented 1 year ago
  1. These days, human detectors are very robust, so the size of human boxes would not be very different.
  2. Please give a visualized example
yunshangyue71 commented 1 year ago

subject9_act5_subact_2 subject9_act5_subact_2

yunshangyue71 commented 1 year ago

I have been following you for a long time. You have papers on the method based on the Max Planck SMPL model and the method based on the skeleton. I would like to ask, if I want to drive an avatar, which of these two methods is better? I think the network is more difficult to understand the parameters in SMPL, so the same network, skeleton based effect is better. What do you think?

mks0601 commented 1 year ago

Regarding the error of dataset, I'm not sure as I exactly followed their data parsing. Maybe the original data could have some errors. To drive avatars, you need rotations, but skeleton-based methods like this repo's paper, only outputs 3D coordinates withtout rotations. If you could get 3D rotations from 3D coordinates with inverse kinematics quickly and robustly, then you could use skeleton-based methods. Otherwise, you should use SMPL-based methods.

yunshangyue71 commented 1 year ago

Yes, yes, there is a problem with the original data.

Thank you for your reply。

mks0601 commented 1 year ago

Good luck to your projects :)