Use one hot encoding for airline column

From correlation matrix it can be seen that when some airlines are mentioned in tweet text there is a correlation with sentiment. Some possitive and some negative. For example:

915 [usairways] : -0.152264
461 [jetblue] : 0.147692

Thus, I tried to use one hot encoding for column 'airline' and introduce the features as input for the classifier.

I got worse results 74.32% using one hot encoded 'airlines' versus 75.11% when not using it. Other parameters were: a BoW of 1000 and using bigrams.

jsantalo / happybirds

Use one hot encoding for airline column #12