Generating round-trip translations

Hi @lena-voita ,

I've been reading your paper "Context-Aware Monolingual Repair for Neural Machine Translation" and checking some of your code for clarification but I still have some doubts regarding the generation of the round-trip translations.

As stated in the paper,

Russian monolingual data is first translated into English, using the Russian→English model and beam search with beam size of 4.

Contrasting it with your code it is not clear to me whether you mean beam size or n-best, as there is an iteration over all the beam hypotheses. Do you keep the best 4 hypotheses?

https://github.com/lena-voita/good-translation-wrong-in-context/blob/bb59382f6bc6c01e0cb8e58e370a8dff8198107b/lib/task/seq2seq/models/DocRepair.py#L451

Then, the paper states,

we use the English→Russian model to sample translations with temperature of 0.5. For each sentence, we precompute 20 sampled translations and randomly choose one of them when forming a training minibatch for DocRepair.

If I'm not mistaken, what I understood looking at your code is that for each of the 4 hypotheses in the previous step you precompute 20 sampled translations. Resulting in 80 possible translations for each original Russian sentence. Is this correct? During the training process in each data iteration I guess you select 4 random translations (one for each n-best hypotheses). Is this right? In addition, you mention random sampling which I guess it is over all the vocabulary, isn't it?

Finally,

Also, in training, we replace each token in the input with a random one with the probability of 10%

In this case, replacement candidates are chosen from the whole vocabulary set, aren't they?

Thanks

lena-voita / good-translation-wrong-in-context

Generating round-trip translations #14