Bug fix: ill pose results for horizontal hands

Issue

Hand pose estimation yields weird results deviating the correct hand pose. github01

Reason

The HPE process follows a two-stage pipeline:

Palm detector yields bounding box for the palm, thus creating a basis for pose regression;
Expand the palm bbox to a hand bbox, and then conduct pose regression.

The problem is the way how the bbox is expanded.

palm

In detection2roi(), the center point(the pink circle) of hand bbox is the center of palm bbox(the red box) plus a bias in y-axis only. So vertical bias is applied to horizontal hands where horizontal bias is needed.

def detection2roi(self, detection):
        if self.detection2roi_method == 'box':
            # compute box center and scale
            # use mediapipe/calculators/util/detections_to_rects_calculator.cc
            xc = (detection[:,1] + detection[:,3]) / 2
            yc = (detection[:,0] + detection[:,2]) / 2
            scale = (detection[:,3] - detection[:,1]) # assumes square boxes
            ...
        yc += self.dy * scale
        scale *= self.dscale
        ...

Solution

Use the hand direction as guidance for bias direction. This may affect other models, so be careful use it only for the hand model. Replace detection2roi() with the code below:

def detection2roi(self, detection):
    """ Convert detections from detector to an oriented bounding box.

    Adapted from:
    # mediapipe/modules/face_landmark/face_detection_front_detection_to_roi.pbtxt

    The center and size of the box is calculated from the center 
    of the detected box. Rotation is calcualted from the vector
    between kp1 and kp2 relative to theta0. The box is scaled
    and shifted by dscale and dy.

    """

    # compute box center and scale
    # use mediapipe/calculators/util/detections_to_rects_calculator.cc
    xc = (detection[:,1] + detection[:,3]) / 2
    yc = (detection[:,0] + detection[:,2]) / 2
    # such a brutal algorithm...
    scale = (detection[:,3] - detection[:,1]) # assumes square boxes

    # compute box rotation
    """
    In the Case of Palm: kp1 = 0, kp2 = 2 i.e. keypoint 0 & 1
    """
    x0 = detection[:,4+2*self.kp1]
    y0 = detection[:,4+2*self.kp1+1]
    x1 = detection[:,4+2*self.kp2]
    y1 = detection[:,4+2*self.kp2+1]
    #theta = np.arctan2(y0-y1, x0-x1) - self.theta0
    theta = torch.atan2(y0-y1, x0-x1) - self.theta0

    # modified: to flexibly change the center shift orientation.
    dl = self.dy * scale
    dl_x = dl * torch.sin(-theta)
    dl_y = dl * torch.cos(-theta)
    xc += dl_x
    yc += dl_y

    scale *= self.dscale * 1.04
    return xc, yc, scale, theta

Here is the correct HPE result: github2

zmurez / MediaPipePyTorch