xingyizhou / pytorch-pose-hg-3d

PyTorch implementation for 3D human pose estimation
GNU General Public License v3.0
613 stars 143 forks source link

Q: scaling in 3D #5

Closed bloodymeli closed 6 years ago

bloodymeli commented 6 years ago

Hi,

I was wondering if you can please help me understand the reasoning behind this code. `

pts_3d = pts_3d - pts_3d[self.root]
s2d, s3d = 0, 0
for e in ref.edges:
  s2d += ((pts[e[0]] - pts[e[1]]) ** 2).sum() ** 0.5
  s3d += ((pts_3d[e[0], :2] - pts_3d[e[1], :2]) ** 2).sum() ** 0.5
scale = s2d / s3d

for j in range(ref.nJoints):
  pts_3d[j, 0] = pts_3d[j, 0] * scale + pts[self.root, 0]
  pts_3d[j, 1] = pts_3d[j, 1] * scale + pts[self.root, 1]
  pts_3d[j, 2] = pts_3d[j, 2] * scale + ref.h36mImgSize / 2`

A) If I understand correctly, it is that all coordinates in pts_3d will be of the same order of magnitude. Am I correct? B) What is the reason behind multiplying by the scale factor in the last three rows? C) Why is root location not multiplied by the same factor, D) why ref.h36mImgSize / 2 is the offset for the z coordinate?

xingyizhou commented 6 years ago

Hi, Thanks for reading the code, A) Yes, keeping the 3D aspect ratio is all this segment of code about. B) Note that pts is the GT 2D keypoint annotation. The goal of the last three line is to align the weakly-perspective projected 3D joints with the 2D annotation. I.e., if you plot pts_3d[:, :2] to image, you will find they are very close to pts. C) I didn't understand the question. See the \hat{Y} equation at 5th page middle right column in the paper. D) To keep z in range (0 ~ 256) as (x, y), actually it is not necessary for that it is re-normalized to -1 ~ 1 in https://github.com/xingyizhou/pytorch-pose-hg-3d/blob/master/src/datasets/h36m.py#L76, you can remove both lines for simplicity (I haven't tried).

bloodymeli commented 6 years ago

Hi,

Thanks for the prompt response. I have to admit that I'm confused by lines 58 and 76 in h36m.py, and line 38 in FusionCriterion.py. If x,y,z are the true coordinates, is the output of the network trained to be 2 * (x,y,z) / output resolution? Why use this calibration choice?

xingyizhou commented 6 years ago

Hi, A very detailed explanation of why/how to use this calibration can be found in Section 3.2 of https://arxiv.org/pdf/1803.09331.pdf .

FANG-Xiaolin commented 6 years ago

Hi Xingyi, I think neither https://github.com/xingyizhou/pytorch-pose-hg-3d/blob/master/src/datasets/h36m.py#L76 nor https://github.com/xingyizhou/pytorch-pose-hg-3d/blob/master/src/datasets/h36m.py#L58 will keep z in a specific range like (0 ~ 256) or (-1 ~ 1). As an extreme example, we can think about two points that have the same x and y value while one of them is extremely far away and has a very big z value. But it will do no harm to the final result since the ratio is kept all the way.