Open nicjac opened 7 years ago
@nicjac I think initialisation is really really important and the learning Rate should be set smaller. Although the original paper actually run on the fine-grained dataset by SGD, I think it may be some tricks... P.s. If you check the spatial transform in google scholar, it has been used in VQA more than in CV tasks.... Or try torch package? https://github.com/qassemoquab/stnbhwd
@layumi thank you for the response. I am just confused by the matconvnet implementation. I still haven't managed to make it work with anything but the included example!
The magnitude of the gradients of the BilinearSampler can be huge. Scaling down the gradients back-proped by the BilinearSampler (e.g. by 0.1) usually helps stabilize training.
Probably this might help. The authors use resnet for creating a STN which uses 224 224 image size. It works well for custom datasets.
Dear all,
I wasn't sure if this should be in issues here on GitHub or on the discussion forum.
I am trying to add a Spatial Transformer to VGG-19. I modified the example code so that it accommodated larger image sizes (256x256 in my case). However, after only a couple iterations the output of BilinearSampler is only made up of zeros.
Anyone successfully extended the included example for larger images? Any guess what the cause of my issue might be?
Thanks!
Nicolas