Closed ARDUJS closed 4 years ago
https://github.com/huggingface/transformers/blob/master/examples/run_language_modeling.py in 225 row
but write 0.5 is ok?
Hi @ARDUJS can you update your issue title to something more descriptive? Thanks!
Should be correct -> 80% masked, that means 20% is left. Using this 20% in 50 % the random word is used, 50% original token is kept. So both random word and original has an overall prob. of 10%.
Original BERT is using the same logic, see here.
https://github.com/huggingface/transformers/blob/master/examples/run_language_modeling.py in 225 row
10% of the time, we replace masked input tokens with random word
but write 0.5 is ok?