jbrry / Irish-BERT

Repository to store helper scripts for creating an Irish BERT model.
Other
9 stars 0 forks source link

paper: compare ga_bert against xml-r #68

Closed jowagner closed 2 years ago

jowagner commented 3 years ago

https://peltarion.com/blog/data-science/a-deep-dive-into-multilingual-nlp-models suggests "that training monolingual models for small languages is unnecessary" as "XLM-R achieved ~80% accuracy whereas the Swedish BERT models reached ~79% accuracy".

Check whether off-the-shelf xlm-roberta (that's more or less just roberta trained on the larger xlm training data in 100 languages, or more languages as the automatic language filter will have classified some data in other languages as belonging to 1 of the 100), performs better in our downstream tasks than Irish-specific ga_bert.

There are two models: base and large.

jowagner commented 2 years ago

Figures ready for xlm-roberta-base.