michmech / irish-word-frequency

About 6,500 Irish lemmas ordered by corpus frequency, with noise removed.
Open Data Commons Open Database License v1.0
32 stars 7 forks source link
gaeilge irish word-frequency

Irish Word Frequency List

This is a list of approximately 6,500 Irish lemmas (= "words") ordered by frequency of use. It has been extracted from the Irish half of the New Corpus for Ireland and then cleaned up by cross-checking against a large-coverage lexicon (Kevin Scannell's Líonra Séimeantach na Gaeilge) and removing lemmas that don't occur in this lexicon. The result is a list which is "clean" in the sense that it doesn't contain any punctuation, personal names, English words or other noise.

License

Available under the Open Database License.

Format

This is a plain-text tab-delimited file encoded in UTF-8 with Windows-style line breaks.