tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.62k stars 3.51k forks source link

[Question] Inadequate translation #1133

Open lkluo opened 6 years ago

lkluo commented 6 years ago

NMT is better than SMT in fluency, while it suffers from inadequate translation for long sentences. I have come across research where coverage is modelled for NMT. Does Transformer have better solutions for this (optimal setting, etc)?

crystal0913 commented 6 years ago

In this paper https://arxiv.org/pdf/1609.08144.pdf, based on beam search algorithm, they includes a coverage penalty to favor translations that fully cover the source sentence according to the attention module.The scoring function s(Y, X) that we employ to rank candidate translations is defined asfollows: (Equation 14 in paper page 12)

    s(Y, X) = log(P(Y|X))/lp(Y ) + cp(X; Y )

The first part of s(Y, X) is length normalization, found it here in transformer, but for the second part cp(X; Y ) which means coverage penalty, I didn't find which piece of code is implementing this function in transformer decoder. Does Transformer have better solutions for this?

lkluo commented 6 years ago

@crystal0913: I think inadequate translation of NMT is a common problem in the community. Adding penalty can work but not a complete solution. I am thinking of hybrid translation with statistical MT.

ccmehk commented 4 years ago

Is there any schedule on adding beta for coverage penalty?