Open Epsilon-Lee opened 4 years ago
@Epsilon-Lee We conduct this ablation study by using this code. Using DAE in a pure-shared model will lead to identity mapping, you can try the older unsupervised nmt framework, which supports some modules unshared.
Thanks a lot for your response, I will definitely have a try and then come back to report the figures!
Hi, dear authors:
I wonder how you conduct the DAE pretraining experiments showned in Table 3. (I only list en-fr results below)
As you described in your paper,
So my questions are:
I have done experiments on using DAE to pre-train the seq2seq model, but when I continue to train with (only) BT [1], I only get the following BLEU scores which is far less than your reported ones, so I wonder if I misunderstand some details.
Please help me out here, thanks a lot.
Footnote. [1] One difference from my training setting and yours is that: during fine-tuning, I only use bt loss instead of both DAE and BT.