Open Kellenbj opened 3 years ago
Excellent suggestion. In fact, I am already using pandas, which uses numpy. My plan is to ultimately use the k-means algorithm not necessarily to provide a run-over-run improvement to an ML core, but instead to produce a set of one-off clusters which represent the most positive and most negative locations of a given search topic. The points you make at the end are good ones, and I'll definitely make sure I take them into consideration when fleshing out the program.
Consider implementing the mathematical operations that you have in the README to further the usage of this code. Numpy can take N-dimensional arrays of data. Something like
`data = np.array[Username, sentiment, tweetnum, location]
get the data as you have in new array 'data2'
data2 = np.concatenate((data,data2), axis = 0)
to stack all the data
` There are a number of interesting machine learning problems you could do from this, but for starters an averaging of the users twitter sentiment could be interesting. Also consider tweet frequency, a user who tweets many times a day vs a user who tweets once a month. Is there a correlation between frequency and sentiment? What kinds of issues are people arising on twitter most frequently?