What does pixel_std for?

microsoft / human-pose-estimation.pytorch

The project is an official implement of our ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking(https://arxiv.org/abs/1804.06208)"

MIT License

2.95k stars 605 forks source link

What does pixel_std for? #94

Open PaTricksStar opened 5 years ago

PaTricksStar commented 5 years ago

https://github.com/Microsoft/human-pose-estimation.pytorch/blob/c3a30c0e1f83e73b3038b1a443becf6b4a19cf1f/lib/dataset/JointsDataset.py#L31 I review the code and find the pixel_std represents the std of human bbox area, right? But why we need to normalize the bbox scale and set it to 200?

rafikg commented 5 years ago

@PaTricksStar , I have the same question. Also, what about the scale = scale*1.25 in this function

def _xywh2cs(self, x, y, w, h):
        center = np.zeros((2), dtype=np.float32)
        center[0] = x + w * 0.5
        center[1] = y + h * 0.5

        if w > self.aspect_ratio * h:
            h = w * 1.0 / self.aspect_ratio
        elif w < self.aspect_ratio * h:
            w = h * self.aspect_ratio
        scale = np.array(
            [w * 1.0 / self.pixel_std, h * 1.0 / self.pixel_std],
            dtype=np.float32)
        if center[0] != -1:
            scale = scale * 1.25

return center, scale

wanghao14 commented 5 years ago

@Gouiaa This also is what confuse me. @leoxiaobin Could you please answer our questions?

annopackage commented 4 years ago

@PaTricksStar @leoxiaobin @wanghao14 @rafikg Have you solved this? I am confused about it.

PaTricksStar commented 4 years ago

I think It is just a hyper parameter representing the default w/h of the bounding box. Just leave it alone. Or you can try to email the author to verify .

lqduc commented 4 years ago

I think it is just a method they store values of bbox h and w. They divide h/w by 200 and then they get the h and w back in get_affine_transform by multiply scale by 200. It just a hyperparam and you could choose another number.

@rafikg As I say above, scale is just another representation of bbox h and w. I think they multiply scale with 1.25 to expand the bbox, in case the bbox fits the human body too much, which lead to information loss.