Teach BERT more about sentence boundaries

jbrry / Irish-BERT

Repository to store helper scripts for creating an Irish BERT model.

Other

9 stars 0 forks source link

Teach BERT more about sentence boundaries #56

Open jowagner opened 3 years ago

jowagner commented 3 years ago

Fine tune BERT with input augmented with <s> and </s> tags (or better a special sequence that in alphanumeric so that BERT does not split it into multiple word pieces after we add it to the vocabulary via two reserved entries) and also use these tags at prediction time. This may help with sentence-by-sentence tasks such as POS tagging and dependency parsing, as well as predicting sentence final tokens rather than any continuation of a sentence when it is known that the sentence ends here.

jowagner commented 3 years ago

Alternatively, and more closely to the title of this issue, one could add a training objective for predicting sentence boundaries.