From correlation matrix it can be seen that when some airlines are mentioned in tweet text there is a correlation with sentiment. Some possitive and some negative. For example:
915 [usairways] : -0.152264
461 [jetblue] : 0.147692
Thus, I tried to use one hot encoding for column 'airline' and introduce the features as input for the classifier.
I got worse results 74.32% using one hot encoded 'airlines' versus 75.11% when not using it. Other parameters were: a BoW of 1000 and using bigrams.
From correlation matrix it can be seen that when some airlines are mentioned in tweet text there is a correlation with sentiment. Some possitive and some negative. For example:
Thus, I tried to use one hot encoding for column 'airline' and introduce the features as input for the classifier.
I got worse results 74.32% using one hot encoded 'airlines' versus 75.11% when not using it. Other parameters were: a BoW of 1000 and using bigrams.