microsoft / MASS

MASS: Masked Sequence to Sequence Pre-training for Language Generation
https://arxiv.org/pdf/1905.02450.pdf
Other
1.11k stars 206 forks source link

Questions for SupNMT #170

Open MSWon opened 3 years ago

MSWon commented 3 years ago

Hi, dear authors

I have two questions for MASS-SupNMT

  1. Should the MASS model be jointly trained on mass_dataset, memt_dataset simultaneously? or is it supposed to be trained separately (ex: Training for several steps on mass_dataset first, and then training at memt_dataset )?

  2. In figure, the input to decoder seems to be masked as well as the encoder, but in noisy_language_pair_dataset.py it seems that only the input to the encoder is masked. Is the code implemented by masking both the inputs of the encoder and decoder? Or is it only masks the input of the encoder?