How to use the pretraining task of ProphetNet

I want to use the pretraining task of ProphetNet, that recovers the mask span of the input sentence. I follow the instruction of Figure 1 in the paper.

For example, the input is But I [MASK][MASK] my life for some lovin\' and some gold and I only recover the first [MASK]. (the sentence is from the pretraining corpus BookCorpus) I use the following code based on HuggingFace:

from transformers import ProphetNetTokenizer, ProphetNetForConditionalGeneration
tokenizer = ProphetNetTokenizer.from_pretrained('prophetnet')
model = ProphetNetForConditionalGeneration.from_pretrained('prophetnet')

# the sentence is from the pretraining corpus BookCorpus
input_ids = tokenizer('But I traded all my life for some lovin\' and some gold', return_tensors="pt")['input_ids']
mask_id = input_ids[0][2]
input_ids[0][2:4] = tokenizer.pad_token_id

decoder_input_ids = tokenizer('[MASK][MASK] I', return_tensors="pt")['input_ids']
# the way of MASS: decoder_input_ids = tokenizer('[MASK][MASK][MASK]', return_tensors="pt")['input_ids']

output = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
probs = output.logits[0][2]
# the rank of the target word in the vocabulary
print((probs[mask_id]<probs).sum())

However, the rank of traded is 15182 among 30522 words. And I also tried different masked words and masked spans, but the results are all unexpected.

So, I want to ask if my way to recover the mask has some errors?

microsoft / ProphetNet

How to use the pretraining task of ProphetNet #43