Open eoghanmurray opened 5 years ago
Another one is comhalta
which I presume is so high because the corpus contained a large number of legislative/legal text.
I've since acquired Liostaí Bhreacadh https://www.breacadh.ie/ (book) which covers top 500 words and divides up by spoken language vs. written.
Maybe a link to that would be appropriate on the front-page?
"Cál" is so high up probably because the New Corpus for Ireland has incorrectly lemmatized some occurrences of "cáil" as a form of "cál", whereas most of the time "cáil" is actually either its own lemma (a noun meaning "reputation", "famousness") or a non-standard compound of "cá bhfuil" ("cáil tú?" = "cá bhfuil tú?" = "where are you?").
The high score of "comhalta" is probably explainable as you say.
Sorry just wanted to register a further issue although I know this is an old repository.
I'm wondering why
cál
is so high up the list as 'kale/cabbage' doesn't seem to merit such a high position.Anyhow probably time I dived into creating a similar word frequency list myself from the source texts as then I'll be able to investigate myself!