jbrry / Irish-BERT

Repository to store helper scripts for creating an Irish BERT model.
Other
9 stars 0 forks source link

Include Scannell's corpus #110

Open jowagner opened 2 years ago

jowagner commented 2 years ago

According to https://twitter.com/EuroDigitalLang/status/1532712368047894533, Kevin Scannell's corpus of web-crawled Irish text has 300 million words. Can we work with him to build models on our and his data?