jfzhang95 / PoseAug

[CVPR 2021] PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation, (Oral, Best Paper Award Finalist)
MIT License
368 stars 57 forks source link

About raising pose from 2d to 3d #34

Closed CheungBH closed 2 years ago

CheungBH commented 2 years ago

Hi. I am running your code aiming at lifting the 2d pose to 3d. However, in the pre-processing step, I found that camera parameters are in need. Can I just use the models to lift the 2d pose to the 3d pose without camera calibration? For example, I obtained an image with humans whose camera parameters are unknown, and using pose lifting model after obtaining the 2d pose.

Garfield-kh commented 2 years ago

Hi, thank you for the interest! As long as you have the image width and height, you can use normalize_screen_coordinates to normalize the 2D keypoint. The distortion factors can be ignored unless it's a fisheye camera. You may refer to this issue #22 as an example.

Hope this helps. Thank you~

CheungBH commented 2 years ago

Hi, thank you for the interest! As long as you have the image width and height, you can use normalize_screen_coordinates to normalize the 2D keypoint. The distortion factors can be ignored unless it's a fisheye camera. You may refer to this issue #22 as an example.

Hope this helps. Thank you~

Does it mean that I just need to normalize the detected 2d keypoint value and feed them into the network like videopose? There's not any calibrations as you have done in the data-processing in https://github.com/jfzhang95/PoseAug/blob/f3f5c4e916ebf7529b873ec1c14c1ce0bf0f5cb1/data/prepare_data_h36m.py#L124 Will it affect the lifting result?

CheungBH commented 2 years ago

Thanks for your reply. I find that I made a mistake that the 2d pose pre-processing code is in https://github.com/jfzhang95/PoseAug/blob/f3f5c4e916ebf7529b873ec1c14c1ce0bf0f5cb1/utils/data_utils.py#L17 . The one I mentioned is to obtain 2d keypoint pixels coordinate with 3d keypoints.

CheungBH commented 2 years ago

Hi, thank you for the interest! As long as you have the image width and height, you can use normalize_screen_coordinates to normalize the 2D keypoint. The distortion factors can be ignored unless it's a fisheye camera. You may refer to this issue #22 as an example.

Hope this helps. Thank you~

I noticed that the normalized coordinate is related to the image size. In this situation, even though the people are doing the same actions, people standing in the left/ middle/ right will result in different normalized values. Will it introduce additional noise? Is It because the dataset of human36m is almost square(1000x1002), which may not be influenced? However, if my image size is rectangular like 1080x720, the result may be affected?

Garfield-kh commented 2 years ago

I noticed that the normalized coordinate is related to the image size. In this situation, even though the people are doing the same actions, people standing in the left/ middle/ right will result in different normalized values. Will it introduce additional noise? The different normalized values will not affect the estimation since in our training data, there are also people standing in the left/ middle/ right. Here is a paper related to this observation, maybe you can have a look about it.

Is It because the dataset of human36m is almost square(1000x1002), which may not be influenced? However, if my image size is rectangular like 1080x720, the result may be affected? You can consider a padding trick, which you pad your image (padding on both side of shorter edge) 1080x720 to 1080x1080, and you will see the normalized value is the same with/without padding.