princeton-vl / pytorch_stacked_hourglass

Pytorch implementation of the ECCV 2016 paper "Stacked Hourglass Networks for Human Pose Estimation"
BSD 3-Clause "New" or "Revised" License
465 stars 94 forks source link

Input dimensions and preprocessing #7

Closed david-wb closed 4 years ago

david-wb commented 4 years ago

I'd like to use this same model on a different set of images, and I'd like to know what image sizes does the network accept? And what image preprocessing steps are required or advised? Thanks

david-wb commented 4 years ago

Also, it looks like the use of nn.Upsampling in the Hourglass layer causes an error when image dimensions aren't powers of 2. Maybe replace it with nn.functional.interpolate?

crockwell commented 4 years ago

Hi David, I hope the below is helpful:

  1. The given code takes images of variable size, and transforms them to (256,256). See data/MPII/dp.py. However, it is somewhat tailored for MPII specifically, taking in scale and center parameters to perform this operation. You could change it, however, to not crop, if desired, on another dataset. Just be aware pretrained models were trained with people centered and of fairly uniform scale -- though no reason the architecture could not learn other cases. 1b. In other words, anything bigger than 256,256 shouldn't be a problem, if you're taking in smaller images you may want to pad, though it could be tough.
  2. Preprocessing currently takes place in data/MPII/dp.py, which consists of a random scale, rotate & rotation during cropping, and color jitter. These should help a small amount during training.
  3. Given some smart pre-processing should be able to avoid changing network details, I wouldn't really mess with the 256 size too much (if you do you'll have to change layers like the first Conv layer, and then re-train, to be consistent with the rest of the network). If you do want to change this, as you have smaller images for example, I think you would have to use some things like "interpolate," for example.
david-wb commented 4 years ago

That helps. Thanks for the thorough reply!