kevinzakka / recurrent-visual-attention

A PyTorch Implementation of "Recurrent Models of Visual Attention"
MIT License
468 stars 123 forks source link

imrove validation accuracy withot MC-sampling from 86% to 98.8-99.2% in 200 epochs #32

Closed malashinroman closed 4 years ago

malashinroman commented 4 years ago

The latest version from the repository gave about 2.5% error rate with monte-carlo test sampling. I think monte-carlo trick shouldn't be used, because it allows to countervail poor attention mechanism. But AM is of the main interest. Without MC-sampling the code gave me ~14% error rate.

I made comparatively small changes to the code which allowed achieving up to 1.2 - 0.8% error rate (depending on random_seed) without monte-carlo sampling (M=1) with six 8x8 glimplses. This is very similar (and even slightly better) to the accuracy from the Mnih's paper. When using whole train set, error rate on test set can reach less than 0.7%.

Here is description of the changes I made:

With this changes best validation result is achieved with around 200 epochs of training.

kevinzakka commented 4 years ago

Wow, this is super helpful @malashinroman! I haven't touched this repo in years but the changes you've made all seem like super sensible design choices so thanks for that :)

kevinzakka commented 4 years ago

Do you think you could update the README with these results?

malashinroman commented 4 years ago

@kevinzakka, sorry for not replying for a while.

I haven't touched this repo in years

What I've found is that your repository is the most popular pytorch implementation of RAM, and to my knowledge not too much is done in hard attention mechanism since the Mnih's et. al paper. I believe people will still find it useful (as I've found it to be useful by myself).

I see that you've already pushed changes to the README. Let me know if you need any info from me.