Closed malashinroman closed 4 years ago
Wow, this is super helpful @malashinroman! I haven't touched this repo in years but the changes you've made all seem like super sensible design choices so thanks for that :)
Do you think you could update the README with these results?
@kevinzakka, sorry for not replying for a while.
I haven't touched this repo in years
What I've found is that your repository is the most popular pytorch implementation of RAM, and to my knowledge not too much is done in hard attention mechanism since the Mnih's et. al paper. I believe people will still find it useful (as I've found it to be useful by myself).
I see that you've already pushed changes to the README. Let me know if you need any info from me.
The latest version from the repository gave about 2.5% error rate with monte-carlo test sampling. I think monte-carlo trick shouldn't be used, because it allows to countervail poor attention mechanism. But AM is of the main interest. Without MC-sampling the code gave me ~14% error rate.
I made comparatively small changes to the code which allowed achieving up to 1.2 - 0.8% error rate (depending on random_seed) without monte-carlo sampling (M=1) with six 8x8 glimplses. This is very similar (and even slightly better) to the accuracy from the Mnih's paper. When using whole train set, error rate on test set can reach less than 0.7%.
Here is description of the changes I made:
With this changes best validation result is achieved with around 200 epochs of training.