There seems to be 465,670 examples for ga in Google's c4 dataset. There are also 322,404gd (Scottish Gaelic) sentences. This is a collection of multilingual text from 71 CommonCrawl dumps.
There is some advice to download the dataset here. It's also integrated into HuggingFace's datasets.
We already have CommonCrawl data, but it would be worth looking at this resource too.
There seems to be 465,670 examples for
ga
in Google's c4 dataset. There are also322,404
gd
(Scottish Gaelic) sentences. This is a collection of multilingual text from 71 CommonCrawl dumps.There is some advice to download the dataset here. It's also integrated into HuggingFace's datasets.
We already have CommonCrawl data, but it would be worth looking at this resource too.