mkocabas / VIBE

Official implementation of CVPR2020 paper "VIBE: Video Inference for Human Body Pose and Shape Estimation"
https://arxiv.org/abs/1912.05656
Other
2.86k stars 550 forks source link

I really couldn't understand the boxes here #235

Open lucasjinreal opened 2 years ago

lucasjinreal commented 2 years ago

The box send to vide is clearly x1y1x2y2....

But why it read as cxcyh? inside this func??

def convert_crop_cam_to_orig_img(cam, bbox, img_width, img_height):
    '''
    Convert predicted camera from cropped image coordinates
    to original image coordinates
    :param cam (ndarray, shape=(3,)): weak perspective camera in cropped img coordinates
    :param bbox (ndarray, shape=(4,)): bbox coordinates (c_x, c_y, h)
    :param img_width (int): original image width
    :param img_height (int): original image height
    :return:
    '''
    cx, cy, h = bbox[:,0], bbox[:,1], bbox[:,2]
    hw, hh = img_width / 2., img_height / 2.
    sx = cam[:,0] * (1. / (img_width / h))
    sy = cam[:,0] * (1. / (img_height / h))
    tx = ((cx - hw) / hw / sx) + cam[:,1]
    ty = ((cy - hh) / hh / sy) + cam[:,2]
    orig_cam = np.stack([sx, sy, tx, ty]).T
    return orig_cam
abcliguanxi commented 1 year ago

bbox coordinates:[cx cy w h] reference line 156 in mpt.py. I can not understand

cx, cy, h = bbox[:,0], bbox[:,1], bbox[:,2] 
why not:
 cx, cy, w,h = bbox[:,0], bbox[:,1], bbox[:,2],bbox[:,2]

i see h = w in bbox