nltk / nltk_data

NLTK Data
1.43k stars 1.03k forks source link

Add ARCOSG (Annotated Reference Corpus of Scottish Gaelic) #146

Open razorfish17 opened 4 years ago

razorfish17 commented 4 years ago

@stevenbird Initial release of the corpus available at: https://doi.org/10.7488/ds/1411

Suggested NLTK name: ARCOSG

I have updated and corrected the corpus for inclusion in NLTK. (The one at the link above is older and shouldn't be used).

Corpus reader code verified: arcosg = LazyCorpusLoader( 'arcosg', CategorizedTaggedCorpusReader, r'.*\.txt', cat_file='cats.prn', tagset='parole', encoding='utf-8', )

Categories file and map to Universal Tag Set created and verified

Licensed under Creative Commons. See: https://doi.org/10.7488/ds/1411

razorfish17 commented 3 years ago

@stevenbird - I think this is still open. The corpus is available at this link