Fine tune BERT with input augmented with <s> and </s> tags (or better a special sequence that in alphanumeric so that BERT does not split it into multiple word pieces after we add it to the vocabulary via two reserved entries) and also use these tags at prediction time. This may help with sentence-by-sentence tasks such as POS tagging and dependency parsing, as well as predicting sentence final tokens rather than any continuation of a sentence when it is known that the sentence ends here.
Fine tune BERT with input augmented with
<s>
and</s>
tags (or better a special sequence that in alphanumeric so that BERT does not split it into multiple word pieces after we add it to the vocabulary via two reserved entries) and also use these tags at prediction time. This may help with sentence-by-sentence tasks such as POS tagging and dependency parsing, as well as predicting sentence final tokens rather than any continuation of a sentence when it is known that the sentence ends here.