nouhadziri / THRED

The implementation of the paper "Augmenting Neural Response Generation with Context-Aware Topical Attention"
https://arxiv.org/abs/1811.01063
MIT License
111 stars 25 forks source link

Missing message-level attention? #10

Closed aleSuglia closed 5 years ago

aleSuglia commented 5 years ago

Hello,

Thanks for releasing this codebase. I was reading your paper about the THRED model (https://arxiv.org/pdf/1811.01063.pdf) and I've noticed that in the generation process you compute two different attention mechanisms: the message attention to generate a representation for the utterances and a context-level attention to generate the context vector in classical HRED model. It looks to me that in the actual implementation the message-level attention is missing: https://github.com/nouhadziri/THRED/blob/master/models/thred/thred_model.py#L212

Is there any reason for this? Did you notice better performance with just the context-level attention?

Thanks a lot for your answer!

Alessandro

nouhadziri commented 5 years ago

Thanks for raising the issue. We refactored the original code before releasing. Message attention was a bit messy. We didn't have a chance to adapt it to the released structure. Our observation was that not much improvement can be gained with the message attention while it makes the model larger and thus, the training slower.

Hope this helps, thanks.