neulab / xnmt

eXtensible Neural Machine Translation
Other
185 stars 44 forks source link

Stative Attenders #569

Open armatthews opened 5 years ago

armatthews commented 5 years ago

This PR enables stative attenders, and contains a sample implementation of "Modeling Coverage for Neural Machine Translation" (Tu et al. 2016).

armatthews commented 5 years ago

Thanks for the feedback, Matthias! I updated the documentation.

I talked to Graham about the mechanism of compute_attention() and update(). I agree that it would be nice to factor this so the translator class doesn't have to call update() but I'm not sure that's possible in general.

The real problem is that the attention vector returned by the attender may not be the "final" attention vector used downstream. For example, if one chooses to ensemble multiple attenders then the final attention vector will not be the same as the vector produced by any individual one. This mechanism allows for us to feed the real attention vector back into the attender even in these types of cases.

I talked to Graham a bit about this, and this was the best solution we came up with. I'm happy to discuss further if you see a better way!

msperber commented 5 years ago

I see, yeah I had suspected something like that. In that case I think this can be merged (once the merge conflicts are resolved)!

-- Matthias (this comment is "not a contribution")

neubig commented 5 years ago

@armatthews If you resolve the conflicts on this I think we can merge.