Closed BastianBaumeister closed 2 years ago
Dear Bastian,
thanks a lot for reporting this issue! This is a hard case, at least if we don't want to write a special rule for this single case. For the next version of HanTa I will definitevly try to solve this problem.|
Best
Christian
Dear Bastian,
I think the problem was mainly in the wrong annotation in the training data. In the new release I hope the number of issues with this type of adjectives is reduced. At least the forms you mentioned are now lemmatized correctly.
Sorry if this isn't the proper channel for reporting non-programming related issues.
First of all I want to say, that I'm quite impressed with Hanta. In the last few days I tried many different lemmatization packages in r and python, but most of them were rather lackluster for german text. HanTa on the other recognizes almost anything I throw at it - great!
There just seems to be one word, that the algorithm doesnt "get along" with: "teuer", specifically its many inflections. Some examples:
word: teurere, lemma: teu word: teure, lemma: teu word: überteuerten, lemma: überteuet
other inflections work fine: word: teuerste, lemma: teuer
Besides this one issue I'm really impressed with your work, and I hope the package will be maintained in the future