jsantalo / happybirds

GNU General Public License v3.0
1 stars 0 forks source link

Use one hot encoding for airline column #12

Closed maytepenella closed 6 years ago

maytepenella commented 6 years ago

From correlation matrix it can be seen that when some airlines are mentioned in tweet text there is a correlation with sentiment. Some possitive and some negative. For example:

Thus, I tried to use one hot encoding for column 'airline' and introduce the features as input for the classifier.

I got worse results 74.32% using one hot encoded 'airlines' versus 75.11% when not using it. Other parameters were: a BoW of 1000 and using bigrams.

maytepenella commented 6 years ago

I uploaded new trans.py with this feature just in case it is needed. If not used can be commented.