Which implementation is better ? Attention or NewAttention?

hengyuan-hu / bottom-up-attention-vqa

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

GNU General Public License v3.0

753 stars 181 forks source link

Which implementation is better ? Attention or NewAttention? #16

Closed ZhuFengdaaa closed 6 years ago

ZhuFengdaaa commented 6 years ago

This paper uses concat to merge the features from v and q(see Eq.1), which is implemented by class Attention in your repository.

On the other hand, you implement NewAttention uses element wise multiplication to merge v and q.

I don't know the reference paper of NewAttention method, and if it outperforms Attention. If you know, please tell me. Thank you very much.

hengyuan-hu commented 6 years ago

elementwise multiplication is often better but not always. For this project, elementwise multiplication is better, as is mentioned in the paragraph above Usage section in the Readme. Elementwise mult is a pretty powerful way to fuse multi modal information. For example, in the original paper they use multiplication for fusing representations before feeding into the classifier.