jbrry / Irish-BERT

Repository to store helper scripts for creating an Irish BERT model.
Other
9 stars 0 forks source link

Add the Irish Crúbadán Web Corpus #118

Open jowagner opened 1 year ago

jowagner commented 1 year ago

According to the Digital Plan for the Irish Language Speech and Language Technologies 2023-2027 of the Government of Ireland, the Irish Crúbadán Web Corpus has "100+ million words of web-crawled written data" and is "updated daily".