Confused about the topic attention in the original paper

Helicqin commented 5 years ago

Sorry to bother you with this. I have read your great paper, but some confusion about the topic attention.

In the paper, you said:

The topic words {t1,t2,...,tn} are then linearly combined to form a fixed-length vector k. The weight values are calculated as the following:

I hardly figure it out. Is it the same as normal query-key-value attention? In my opinion, the final context-level encoder hidden state serves as a query, the word embedding of topic words serve as values. But how are the weights \beta calculated?

Look forward to your reply! Thanks.

ehsk commented 5 years ago

Thanks for your interest in our work. Your guess is correct. It is a regular attention and the equation for learning the weights is as follows:

We have made a few changes (including this one) in the paper, which will be published shortly.

Helicqin commented 5 years ago

Thanks for your reply!

nouhadziri / THRED

Confused about the topic attention in the original paper #11