Questions for SupNMT - Githubissues

Hi, dear authors

I have two questions for MASS-SupNMT

Should the MASS model be jointly trained on mass_dataset, memt_dataset simultaneously? or is it supposed to be trained separately (ex: Training for several steps on mass_dataset first, and then training at memt_dataset )?
In figure, the input to decoder seems to be masked as well as the encoder, but in noisy_language_pair_dataset.py it seems that only the input to the encoder is masked. Is the code implemented by masking both the inputs of the encoder and decoder? Or is it only masks the input of the encoder?

microsoft / MASS