princeton-vl / pytorch_stacked_hourglass

Pytorch implementation of the ECCV 2016 paper "Stacked Hourglass Networks for Human Pose Estimation"
BSD 3-Clause "New" or "Revised" License
465 stars 94 forks source link

Confuse about 'MaxPool2d' in 'posenet'-'pre' #4

Closed zenghy96 closed 4 years ago

zenghy96 commented 4 years ago

class PoseNet(nn.Module): def init(self, nstack, inp_dim, oup_dim, bn=False, increase=0, kwargs): super(PoseNet, self).init() self.nstack = nstack self.pre = nn.Sequential( Conv(3, 64, 7, 2, bn=True, relu=True), Residual(64, 128), Pool(2, 2),** Residual(128, 128), Residual(128, inp_dim) ) If use MaxPool2d(2,2) here, the prediction heatmaps after every hourgalss will be [H/2,W/2]. So should we generate the [H/2, W/2] (not [H, W] size ) ground truth heatmap based on labels, when calculate the loss(heatmaps, combined_preds)? If we do this, some accordingly opreration should be usded to extract prediction points from combined_preds? THX!

crockwell commented 4 years ago

Ground truth heatmap is actually of size H/4, W/4 because the first Conv layer also uses stride of 2. See https://github.com/princeton-vl/pytorch_stacked_hourglass/blob/master/data/MPII/dp.py#L55 for operations transforming ground truth keypoints to lower-resolution heatmaps.