vlfeat / matconvnet

MatConvNet: CNNs for MATLAB
Other
1.4k stars 753 forks source link

Spatial Transformer for larger images #880

Open nicjac opened 7 years ago

nicjac commented 7 years ago

Dear all,

I wasn't sure if this should be in issues here on GitHub or on the discussion forum.

I am trying to add a Spatial Transformer to VGG-19. I modified the example code so that it accommodated larger image sizes (256x256 in my case). However, after only a couple iterations the output of BilinearSampler is only made up of zeros.

Anyone successfully extended the included example for larger images? Any guess what the cause of my issue might be?

Thanks!

Nicolas

layumi commented 7 years ago

@nicjac I think initialisation is really really important and the learning Rate should be set smaller. Although the original paper actually run on the fine-grained dataset by SGD, I think it may be some tricks... P.s. If you check the spatial transform in google scholar, it has been used in VQA more than in CV tasks.... Or try torch package? https://github.com/qassemoquab/stnbhwd

nicjac commented 7 years ago

@layumi thank you for the response. I am just confused by the matconvnet implementation. I still haven't managed to make it work with anything but the included example!

ankush-me commented 7 years ago

The magnitude of the gradients of the BilinearSampler can be huge. Scaling down the gradients back-proped by the BilinearSampler (e.g. by 0.1) usually helps stabilize training.

rizwanasif commented 5 years ago

Probably this might help. The authors use resnet for creating a STN which uses 224 224 image size. It works well for custom datasets.

https://github.com/layumi/Pedestrian_Alignment