jbrry / Irish-BERT

Repository to store helper scripts for creating an Irish BERT model.
Other
9 stars 0 forks source link

Check prevalence of MT output in web-crawled corpora #90

Open jowagner opened 2 years ago

jowagner commented 2 years ago

Estimate how much of the paracrawl and oscar data is MT output, e.g.

Add footnote warning about the amount of MT-generated Irish text to paper. Need figure what fraction of the filtered data is MT.