wartaal / HanTa

The Hanover Tagger - A simple approach to lemmatization and POS-tagging of German morphology based on heuristics and hidden markov models
GNU Lesser General Public License v3.0
47 stars 2 forks source link

Where can I see the entire set of possible POS tags? #4

Closed IngoStorm closed 1 year ago

IngoStorm commented 2 years ago

It seems I cannot find a way to list all the tags your tagger knows. There are definetely some that are not listed in your tutorial. Where can I find the entire list, hopefully including an explanation and examples for each tag?

Thanks!

RicKrauel commented 2 years ago

I think the tags are based on the Tiger annotation scheme, available here:

https://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/mitarbeiter-innen/hagen/STTS_Tagset_Tiger

wartaal commented 2 years ago

Sorry, I replied by mail instead of using Github. Indeed, HanTa is trained mainly on the Tiger corpus and thus uses the Tiger annotation Scheme: https://www.ims.uni-stuttgart.de/documents/ressourcen/korpora/tiger-corpus/annotation/tiger_scheme-morph.pdf (esp. pp 26/27)

A general description is available e.g. here: https://www.ims.uni-stuttgart.de/forschung/ressourcen/lexika/germantagsets/#id-cfcbf0a7-0 or here: https://homepage.ruhr-uni-bochum.de/stephen.berman/Korpuslinguistik/Tagsets-STTS.html

These are the POS tags that are used. Most tags used to annotate morphemes should be quite clear given the POS tags. I am working on a documentation that includes those as well.

wartaal commented 1 year ago

In the latest version I have added two methods:

The first one gives a list of all POS-tags, the second a list of all tags used for morphemes. For each tag some random examples are generated. Have a look at the Demo-Notebook in the German section for an example.

Besides that, I am still woking on a comprehensive documentation.