List of good and bad words

maytepenella commented 6 years ago

Find out which words are more correlated to possitive and negative sentiment. Maybe we can use the matrix correlation to define best possitive and negative words (+ correlations are good and - correlations appear more in negative sentences)

maytepenella commented 6 years ago

Added function get_vocabulary_per_sentiment to train.py

This funciton takes negative tweets, creates a count_vectorizer of bow_size2 words
then takes possitive tweets, creates a count_vectorizer of bow_size2 words
Prints correlations with sentiment
Updates dictionary with possitive terms not found in negative tweets and creates a new countvectorizer with this new vocabulary.
Final length of vocabulary is lower than 2* bow_size2 as there are always words that are present in both negative and possitve tweets.

TO DO: discard words with low correlation?

maytepenella commented 6 years ago

TO DO: Upload version with neutral vocabulary

maytepenella commented 6 years ago

Modified version that considers all the senitments present in df.

Code has been redone to be more understandable.

For now we will not discard words based on correlation

maytepenella commented 6 years ago

Change to suit with main.py:

Consider language as input
Check that works without warnings/errors

maytepenella commented 6 years ago

Fixed on last commit

jsantalo / happybirds

List of good and bad words #5