princeton-vl / pytorch_stacked_hourglass

Pytorch implementation of the ECCV 2016 paper "Stacked Hourglass Networks for Human Pose Estimation"
BSD 3-Clause "New" or "Revised" License
465 stars 94 forks source link

Is the flip augmentation is correct? #43

Closed Nightmare4214 closed 1 year ago

Nightmare4214 commented 1 year ago

In data/MPII/dp.py

the filp augmentation is below

 if np.random.randint(2) == 0:
            inp = self.preprocess(inp)
            inp = inp[:, ::-1]
            keypoints = keypoints[:, ds.flipped_parts['mpii']]
            keypoints[:, :, 0] = self.output_res - keypoints[:, :, 0]
            orig_keypoints = orig_keypoints[:, ds.flipped_parts['mpii']]
            orig_keypoints[:, :, 0] = self.input_res - orig_keypoints[:, :, 0]

first it swaps (like the right ankle(idx is 0) and the left ankle(idx is 5)), and then flip my question is why not just flp directly, will the network predict it correctly?

crockwell commented 1 year ago

So two things are going on here with keypoints when the image is flipped.

(1) Each joint at location x changes to location W - x. (2) Left joints are changed to right joints and vice versa.

The second step is counterintuitive, but is necessary. Consider a person facing away from the camera. Then left joint x_l will be to the left of (<) right joint x_r in the image. So after performing flipping, W - x_l > W - x_r. This means joint x_l will be to the right of x_r in the image. However, the person is still facing away from the camera, so x_l should be to the left of x_r. Swapping right and left joints fixes this problem.

Nightmare4214 commented 1 year ago

Can I understand it this way After flipping, it looks like the left joint is still at the left of the right joint, so we should do the second operatation (Left joints are changed to right joints and vice versa)