graykode / nlp-tutorial

Natural Language Processing Tutorial for Deep Learning Researchers
https://www.reddit.com/r/MachineLearning/comments/amfinl/project_nlptutoral_repository_who_is_studying/
MIT License
14.03k stars 3.9k forks source link

Some problems about Bert #40

Open tfighting opened 4 years ago

tfighting commented 4 years ago

line 70: index = randint(0, vocab_size - 1) # random index in vocabulary. I think the replace index can't involve 'cls' ,'sep' and 'mask'!

bruce1408 commented 3 years ago

line 70: index = randint(0, vocab_size - 1) # random index in vocabulary. I think the replace index can't involve 'cls' ,'sep' and 'mask'!

Yes, it`s right. so the code should change like this :

if random() < 0.8:  # 80%
    input_ids[pos] = word_dict['[MASK]']  # make mask
elif random() > 0.9:
    index = randint(0, vocab_size - 1)
    while index < 4: # cause {'[PAD]': 0, '[CLS]': 1, '[SEP]': 2, '[MASK]': 3} are all  meanless
        index = randint(0, vocab_size - 1)
    input_ids[pos] = index
lukysummer commented 2 years ago

How about just : index = randint(4, vocab_size - 1)