Open JiekeLi opened 4 years ago
I was also wondering the same. Masking should only be applied to input tokens
because loss function is CrossEntropyLoss(ignore_index=0) ,it ignored index for zero,only compute loss for mask item or random item。
but :
if prob < 0.8:
tokens.append(self.mask_token)
labels.append(s)
elif prob < 0.9:
tokens.append(self.rng.randint(1, self.num_items))
labels.append(s)
else:
tokens.append(s)
labels.append(0) #? I changed it.
And why divide one more time using 0.8, 0.9 after mask_prob? Random item insertion on 0.15 * 0.1 ? I couldn't find this part on paper
I want to ask a question that why you give '0' label to the none masked token when generate a example here:
Thanks !