Tagging errors in PatternParser output may lead to incorrect lemmatization of frequent German adjectives. As a consequence of this, there will be unexpected results in all tools relying on the parser's output (pos tagging, sentiment analysis, noun phrase extraction, etc.):
Example (using ipython):
In [1]: from textblob_de import TextBlobDE
In [2]: TextBlobDE(u"Peter hat einen schönen Hund.").sentiment
Out[2]: Sentiment(polarity=0.0, subjectivity=0.0)
Out[EXPECTED]: Sentiment(polarity=1.0, subjectivity=0.0)
In [3]: TextBlobDE(u"Peter hat einen schönen Hund.").noun_phrases
Out[3]: WordList([])
Out[EXPECTED]: WordList([u'schönen Hund'])
In [4]: TextBlobDE(u"Peter hat einen schönen Hund.").tags
Out[4]: [('Peter', 'NNP'), ('hat', 'VB'), ('einen', 'DT'), (u'schönen', 'PRP$'), ('Hund', 'NN')]
Out[EXPECTED]: [..., (u'schönen', 'JJ'), ...]
Root cause:
In [5]: from pattern.de import parse, pprint
In [6]: pprint(parse(u"Peter hat einen schönen Hund.", lemmata=True))
WORD TAG CHUNK ROLE ID PNP LEMMA
Peter NNP NP - - - peter
hat VB VP - - - haben
einen DT NP - - - ein
schönen > PRP$ < NP ^ - - - > schön[en] <
Hund NN NP ^ - - - hund
. . - - - - .
Please direct suggestions for improvement directly to the pattern project (see e.g. https://github.com/clips/pattern/issues/63). The version of pattern.text.de included in textblob-de will be updated on a regular basis.
I am also working on the integration of additional lemmatizers into textblob_de, but PatternParserLemmatizer will remain the default choice, as it is implemented in Python.
Tagging errors in
PatternParser
output may lead to incorrect lemmatization of frequent German adjectives. As a consequence of this, there will be unexpected results in all tools relying on the parser's output (pos tagging, sentiment analysis, noun phrase extraction, etc.):Example (using ipython):
Root cause:
Please direct suggestions for improvement directly to the
pattern
project (see e.g. https://github.com/clips/pattern/issues/63). The version ofpattern.text.de
included intextblob-de
will be updated on a regular basis.I am also working on the integration of additional lemmatizers into
textblob_de
, butPatternParserLemmatizer
will remain the default choice, as it is implemented in Python.