kevinzakka / recurrent-visual-attention

A PyTorch Implementation of "Recurrent Models of Visual Attention"
MIT License
468 stars 123 forks source link

Issue with RGB data #35

Closed GKalliatakis closed 3 years ago

GKalliatakis commented 3 years ago

I've trained the RAM implementation on various 3 channel images and plotted the glimpses extracted by the network on a random batch at various epochs. The bounding box does not seem to move around the input image to explore different locations (see videos below). Any idea why glimpses seem to be stuck on the top left side of the images when using RGB images but seem to move around with the grayscale MNIST? Have you encountered such behaviour when trained on other data?

SVHN: epoch 12 epoch 24

CIFAR10

MNIST

@clvcooke @kevinzakka @malashinroman

GKalliatakis commented 3 years ago

Actually I found what's wrong with that

https://github.com/kevinzakka/recurrent-visual-attention/blob/a38ac8958ebf1c61a10c4d5320f1e31d3d0b73dd/plot_glimpses.py#L41

img_shape in this line was not obtained correctly when different datasets were tested