Closed Nightmare4214 closed 1 year ago
So two things are going on here with keypoints when the image is flipped.
(1) Each joint at location x changes to location W - x. (2) Left joints are changed to right joints and vice versa.
The second step is counterintuitive, but is necessary. Consider a person facing away from the camera. Then left joint x_l will be to the left of (<) right joint x_r in the image. So after performing flipping, W - x_l > W - x_r. This means joint x_l will be to the right of x_r in the image. However, the person is still facing away from the camera, so x_l should be to the left of x_r. Swapping right and left joints fixes this problem.
Can I understand it this way After flipping, it looks like the left joint is still at the left of the right joint, so we should do the second operatation (Left joints are changed to right joints and vice versa)
In data/MPII/dp.py
the filp augmentation is below
first it swaps (like the right ankle(idx is 0) and the left ankle(idx is 5)), and then flip my question is why not just flp directly, will the network predict it correctly?