In addition to wikibert-ga and multilingual BERT, we needs to talk about:
BERTreach: The model card says it is trained on 47 million tokens (2.1 million sentences), including the PARSEME verbal multiword expressions corpus for Irish, presumably using the unannotated raw portion. Furthermore they use Corpus Crawler and a small corpus, presumably from https://wortschatz.uni-leipzig.de/en/download/Irish (the link in the model card does not work).
LaBSE: a multilingual model trained to encode the meaning of sentences and covering 109 languages including Irish
In addition to wikibert-ga and multilingual BERT, we needs to talk about:
See also: