Open lkluo opened 6 years ago
In this paper https://arxiv.org/pdf/1609.08144.pdf, based on beam search algorithm, they includes a coverage penalty to favor translations that fully cover the source sentence according to the attention module.The scoring function s(Y, X) that we employ to rank candidate translations is defined asfollows: (Equation 14 in paper page 12)
s(Y, X) = log(P(Y|X))/lp(Y ) + cp(X; Y )
The first part of s(Y, X) is length normalization, found it here in transformer, but for the second part cp(X; Y ) which means coverage penalty, I didn't find which piece of code is implementing this function in transformer decoder. Does Transformer have better solutions for this?
@crystal0913: I think inadequate translation of NMT is a common problem in the community. Adding penalty can work but not a complete solution. I am thinking of hybrid translation with statistical MT.
Is there any schedule on adding beta for coverage penalty?
NMT is better than SMT in fluency, while it suffers from inadequate translation for long sentences. I have come across research where coverage is modelled for NMT. Does Transformer have better solutions for this (optimal setting, etc)?