This is a repository with the data and code for the ACL 2019 paper "When a Good Translation is Wrong in Context: ..." and the EMNLP 2019 paper "Context-Aware Monolingual Repair for Neural Machine Translation"
I've been reading your paper "Context-Aware Monolingual Repair for Neural Machine Translation" and checking some of your code for clarification but I still have some doubts regarding the generation of the round-trip translations.
As stated in the paper,
Russian monolingual data is first translated into English, using the Russian→English model and beam search with beam size of 4.
Contrasting it with your code it is not clear to me whether you mean beam size or n-best, as there is an iteration over all the beam hypotheses. Do you keep the best 4 hypotheses?
we use the English→Russian model to sample translations with temperature of 0.5. For each sentence, we precompute 20 sampled translations and randomly choose one of them when forming a training minibatch for DocRepair.
If I'm not mistaken, what I understood looking at your code is that for each of the 4 hypotheses in the previous step you precompute 20 sampled translations. Resulting in 80 possible translations for each original Russian sentence. Is this correct? During the training process in each data iteration I guess you select 4 random translations (one for each n-best hypotheses). Is this right? In addition, you mention random sampling which I guess it is over all the vocabulary, isn't it?
Finally,
Also, in training, we replace each token in the input with a random one with the probability of 10%
In this case, replacement candidates are chosen from the whole vocabulary set, aren't they?
Hi @lena-voita ,
I've been reading your paper "Context-Aware Monolingual Repair for Neural Machine Translation" and checking some of your code for clarification but I still have some doubts regarding the generation of the round-trip translations.
As stated in the paper,
Contrasting it with your code it is not clear to me whether you mean beam size or n-best, as there is an iteration over all the beam hypotheses. Do you keep the best 4 hypotheses?
https://github.com/lena-voita/good-translation-wrong-in-context/blob/bb59382f6bc6c01e0cb8e58e370a8dff8198107b/lib/task/seq2seq/models/DocRepair.py#L451
Then, the paper states,
If I'm not mistaken, what I understood looking at your code is that for each of the 4 hypotheses in the previous step you precompute 20 sampled translations. Resulting in 80 possible translations for each original Russian sentence. Is this correct? During the training process in each data iteration I guess you select 4 random translations (one for each n-best hypotheses). Is this right? In addition, you mention random sampling which I guess it is over all the vocabulary, isn't it?
Finally,
In this case, replacement candidates are chosen from the whole vocabulary set, aren't they?
Thanks