metalcorebear / NRCLex

An affect generator based on TextBlob and the NRC affect lexicon. Note that lexicon license is for research purposes only.
MIT License
65 stars 39 forks source link

Capitalization Affecting Outcome #11

Open Yangyi-Zhang opened 2 years ago

Yangyi-Zhang commented 2 years ago

Hi,

Thank you for the project. I saw different outputs when alternating the capitalization of input, for example:

text = "I love to visit historical places" emotion = NRCLex(text) print(emotion.top_emotions)

yields [('positive', 0.6666666666666666)], with raw emotion score {'joy': 1, 'positive': 2}.

while if I change text to "I Love to visit historical places", the top_emotions becomes [('positive', 1.0)].

How do we explain this? I understand this might not be an issue, but I am curious as a beginner in NLP.

metalcorebear commented 2 years ago

Thanks for bringing this up. I think we need to add a line of code to make all words within the examined corpus lowercase. That should overcome this issue, since the affect word list is all lowercase.