markuskiller / textblob-de

German language support for TextBlob.
https://textblob-de.readthedocs.org
MIT License
104 stars 12 forks source link

blob.sentiment doesn't return polarity for "Erfolg" (or any other noun I tried) #22

Open amieth opened 2 years ago

amieth commented 2 years ago

Hi there,

thanks a lot for this great package!

While analyzing some texts with TextBlob-DE I stumbled upon the following difference between TextBlob and TextBlob-DE concerning the polarity of nouns as returned by blob.sentiment:

    from textblob_de import TextBlobDE as TextBlob
    from textblob import TextBlob as TextB

    blob = TextBlob('Das ist ein Erfolg') 
    print(blob.sentiment)

    blob = TextB('this is a success')
    print(blob.sentiment)

This code prints the following lines:

Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.3, subjectivity=0.0)

I wonder why TextBlob-DE seems to be unable to find the polarity for "Erfolg" in the German sentiment file:

<word confidence="1.0" form="Erfolg" intensity="1.0" 
polarity="1.0" pos="NN" subjectivity="0.0"/>

On the other hand TextBlob is obviously able to include the polarity for "success" into the sentiment calculation:

<word form="success" wordnet_id="n-14474894" pos="NN" sense="a state of prosperity or fame" 
polarity="0.3" subjectivity="0.0" intensity="1.0" confidence="0.9" />

I would very much appreciate if you could give me any advice on how to resolve this issue.

Kind regards, Andreas

amieth commented 2 years ago

I fixed this problem in my local copy by changing the following statement in Sentiment.load():

w.attrib.get("form"),

changed to

w.attrib.get("form").lower(),

Now the lowercased words in the input text can be matched with the lowercased words in the loaded lexicon. Here the results of a "tiny test":

Text; Kommentar; sentiment(polarity) Das ist ein Erfolg; positives Substantiv; 1.0 Das war ein kein Erfolg; negiertes positives Substantiv; -0.5 Der Test verlief positiv; positives Adverb; 0.5 Sie fährt ein grünes Auto; neutraler Satz; 0.0