I've trained the RAM implementation on various 3 channel images and plotted the glimpses extracted by the network on a random batch at various epochs. The bounding box does not seem to move around the input image to explore different locations (see videos below). Any idea why glimpses seem to be stuck on the top left side of the images when using RGB images but seem to move around with the grayscale MNIST? Have you encountered such behaviour when trained on other data?
I've trained the RAM implementation on various 3 channel images and plotted the glimpses extracted by the network on a random batch at various epochs. The bounding box does not seem to move around the input image to explore different locations (see videos below). Any idea why glimpses seem to be stuck on the top left side of the images when using RGB images but seem to move around with the grayscale MNIST? Have you encountered such behaviour when trained on other data?
SVHN: epoch 12 epoch 24
CIFAR10
MNIST
@clvcooke @kevinzakka @malashinroman