jbrry / Irish-BERT

Repository to store helper scripts for creating an Irish BERT model.
Other
9 stars 0 forks source link

NCI: inconsistent <s> and <p> tags #31

Closed jowagner closed 3 years ago

jowagner commented 3 years ago

Issue #4 reports:

jowagner commented 3 years ago

Our extractor treats any occurrence of an opening or closing tag as a sentence boundary, making it as robust to these inconsistencies as possible without using the content itself as boundary indicator.