From the code we see that pred_probs is acutally used as mask, random, keep, not word_mask_keep_rand. This implies the default args 0.1, 0.1, 0.8 is not mask 0.1, keep 0.1, rand 0.8, but actually mask 0.1, rand 0.1, keep 0.8, quite different from what the variable name says
In short, word_mask_keep_rand should have been named as word_mask_rand_keep.
Hi there, I think I found a typo or I'm confused. In MASS-supNMT, xmasked_seq2seq.py, word_mask_keep_rand defaults to '0.1, 0.1, 0.8'.
Then this mask_keep_rand is passed to args as "pred_probs" on line 119 , which is then passed to MaskedLanguagePairDataset
In masked_language_pair_dataset.py, in random_word(), the way pred_probs is [used] is here https://github.com/microsoft/MASS/blob/208ead5f92999168a6bbc6481d0d4b5f90700414/MASS-supNMT/mass/masked_language_pair_dataset.py#L183-L184
From the code we see that pred_probs is acutally used as mask, random, keep, not word_mask_keep_rand. This implies the default args 0.1, 0.1, 0.8 is not mask 0.1, keep 0.1, rand 0.8, but actually mask 0.1, rand 0.1, keep 0.8, quite different from what the variable name says
In short, word_mask_keep_rand should have been named as word_mask_rand_keep.