Closed ZhuFengdaaa closed 6 years ago
elementwise multiplication is often better but not always. For this project, elementwise multiplication is better, as is mentioned in the paragraph above Usage section in the Readme. Elementwise mult is a pretty powerful way to fuse multi modal information. For example, in the original paper they use multiplication for fusing representations before feeding into the classifier.
This paper uses concat to merge the features from v and q(see Eq.1), which is implemented by class Attention in your repository.
On the other hand, you implement NewAttention uses element wise multiplication to merge v and q.
I don't know the reference paper of NewAttention method, and if it outperforms Attention. If you know, please tell me. Thank you very much.