Closed IngoStorm closed 1 year ago
I think the tags are based on the Tiger annotation scheme, available here:
Sorry, I replied by mail instead of using Github. Indeed, HanTa is trained mainly on the Tiger corpus and thus uses the Tiger annotation Scheme: https://www.ims.uni-stuttgart.de/documents/ressourcen/korpora/tiger-corpus/annotation/tiger_scheme-morph.pdf (esp. pp 26/27)
A general description is available e.g. here: https://www.ims.uni-stuttgart.de/forschung/ressourcen/lexika/germantagsets/#id-cfcbf0a7-0 or here: https://homepage.ruhr-uni-bochum.de/stephen.berman/Korpuslinguistik/Tagsets-STTS.html
These are the POS tags that are used. Most tags used to annotate morphemes should be quite clear given the POS tags. I am working on a documentation that includes those as well.
In the latest version I have added two methods:
The first one gives a list of all POS-tags, the second a list of all tags used for morphemes. For each tag some random examples are generated. Have a look at the Demo-Notebook in the German section for an example.
Besides that, I am still woking on a comprehensive documentation.
It seems I cannot find a way to list all the tags your tagger knows. There are definetely some that are not listed in your tutorial. Where can I find the entire list, hopefully including an explanation and examples for each tag?
Thanks!