Open Waino opened 7 years ago
Perhaps a better (= quicker to implement) start would be to implement the decoding-time coverage normalization method from section 7 of the Google NMT paper. This would only require changing the HNMT code so that it returns attention predictions, and then modifying the beam search code in BNAS to use it.
Implement coverage in the attention mechanism, following [1].
[1] Tu, Zhaopeng, et al. "Coverage-based Neural Machine Translation." arXiv preprint arXiv:1601.04811 (2016). http://arxiv.org/pdf/1601.04811