Closed Helicqin closed 5 years ago
Thanks for your interest in our work. Your guess is correct. It is a regular attention and the equation for learning the weights is as follows:
We have made a few changes (including this one) in the paper, which will be published shortly.
Thanks for your reply!
Sorry to bother you with this. I have read your great paper, but some confusion about the topic attention.
In the paper, you said:
I hardly figure it out. Is it the same as normal query-key-value attention? In my opinion, the final context-level encoder hidden state serves as a query, the word embedding of topic words serve as values. But how are the weights \beta calculated?
Look forward to your reply! Thanks.