princeton-vl / pytorch_stacked_hourglass

Pytorch implementation of the ECCV 2016 paper "Stacked Hourglass Networks for Human Pose Estimation"
BSD 3-Clause "New" or "Revised" License
465 stars 94 forks source link

How do you balance positive and negative annotations (foreground gaussian and background zeros) #10

Closed ethanyanjiali closed 4 years ago

ethanyanjiali commented 4 years ago

In the predicted heatmap, let's say 64x64, you will have 4096 pixels in total, however, only 7x7 pixels (or 9x9 depends on implementation) are gaussian foreground, and all the remaining are just zero. Without any techniques to balance, the network could learn the trivial solution really quickly by producing an all-zero heatmap. I don't see anything to address this in either the code or the paper, would you mind to share some thoughts on this? Thanks.

crockwell commented 4 years ago

This is true -- the supervision is somewhat "sparse". I think empirically, you do see most outputs being very close to 0 most of the time -- especially early in training. As training progresses, optimizing loss requires getting predictions closer and closer to 1 on confident pixels. If it is helpful, you can think of classification on Imagenet as having some similarity: 1000-way classification only has one correct score, though as training progresses networks can learn to have very high confidence for the correct predictions!

This paper was not the first to use the heatmap supervision approach, by the way. Perhaps earlier work could explain more detail.

ethanyanjiali commented 4 years ago

thanks for the reply! it might be easier to converge on multi-class because all of them will get activation at least from some images, and also cross-entropy punish misclassification more. the MSE used in the paper doesn't seem to be too sensitive here.

for me, i ended up assigning more weights for foreground pixel and scale up the gaussian value. do you see the loss value dropped constantly without adding any trick to the loss function?

crockwell commented 4 years ago

No problem. "cross-entropy punish misclassification more" -- I used this more of an analogy, though, yes, it isn't a perfect comparison. Nevertheless, MSE error does still punish in both directions.

This loss function did train well as-is. Of course, it is always possible it could be modified to train faster!

ethanyanjiali commented 4 years ago

Thanks. Maybe I should try to run it longer when I apply vanilla MSE for loss. I'm closing this issue now.