Closed matteorr closed 3 years ago
While I can't speak exactly to the details of your problem, here's the way I would approach it:
If you look in models/layers.py the Hourglass class has an argument "n" which determines how many recursive calls are used to the hourglass, where each has a lower input resolution size. You may want to decrease this if you are using smaller than 256x256 resolution.
Pooling may look a little odd if you don't have a multiple of 16x16, which could also cause problems. Or perhaps certain output dimensions might be off based on input dimension. You'd just have to play with it if this were the problem. Maybe slightly changing size of image is the best solution, as you've guessed :)
Thanks for the quick reply! Brief follow-ups:
Thanks a lot again! Feel free to close after your reply.
Yes, that parameter. By decreasing recursive calls, you decrease the "depth" of the hourglass. In other words, looking at Fig 3. in the paper, the middle (lowest resolution) of the hourglass would no longer be used. You can think of each recursive call as another layer in this diagram. Does that make sense?
I don't think there are ablations in the paper: usually using less than this full hourglass can probably be assumed to be not as effective for high resolution. In your case, it may be the best alternative.
Makes sense! I'll pose here updates if I find out something interesting. Thanks again!
When training a network from scratch with images of input size 96, I get the following error trace:
Here is a small snippet to reproduce the error:
If my understanding is correct, this is due to the fact that with input size 96 and depth 4 the
maxpooling
operations result inup1
having size 3 andup2
having size 2 (since it is an upsampling oflow3
which has size 1). What is your suggestion on how to solve this?I understand you might say to just use a larger image size, but could you please provide some indications on how to fix the problem from an architecture point of view?
Thanks!