travel-go / Abstractive-Text-Summarization

Contrastive Attention Mechanism for Abstractive Text Summarization
Other
40 stars 7 forks source link

❓ Usefulness of the opponent function #3

Open astariul opened 4 years ago

astariul commented 4 years ago

As I understood your paper, opponent attention is trained through softmin.
softmin is actually the reason why conventional attention and opponent attention are trained in an opposite fashion.


However, what's the point of the opponent function ?

image

If the opponent function was removed, it would be the equivalent of another attention head, trained negatively (because of softmin).
So why adding such an function, which as I understand it simply mask existing conventional attention scores (therefore loosing information ?) ?