Closed luffycodes closed 4 years ago
Would be a great improvement :+1:
Here's btw. the commit that introduced WWM in BERT:
https://github.com/google-research/bert/commit/0fce551b55caabcfba52c61e18f34b541aef186a
BERT using wordpiece tokenizer, however, roberta uses byte-piece tokenizer. I think the implementations may be slightly different, if not starkly different (due to different start token indicators).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
🚀 Feature request
Currently, training the models from scratch like Roberta do not support whole word masking (e.g., language modeling examples). Only pre-trained models are available. Is it possible to include whole word masking in the input layers?
Motivation
Whole word masking leads to performance boosts. So, adding this feature would be useful if someone wants to train the models from scratch.