mkovach2 / EC601_proj2

0 stars 0 forks source link

Mathematical Implementation of Results #2

Open Kellenbj opened 3 years ago

Kellenbj commented 3 years ago

Consider implementing the mathematical operations that you have in the README to further the usage of this code. Numpy can take N-dimensional arrays of data. Something like

`data = np.array[Username, sentiment, tweetnum, location]

get the data as you have in new array 'data2'

data2 = np.concatenate((data,data2), axis = 0)

to stack all the data

` There are a number of interesting machine learning problems you could do from this, but for starters an averaging of the users twitter sentiment could be interesting. Also consider tweet frequency, a user who tweets many times a day vs a user who tweets once a month. Is there a correlation between frequency and sentiment? What kinds of issues are people arising on twitter most frequently?

mkovach2 commented 3 years ago

Excellent suggestion. In fact, I am already using pandas, which uses numpy. My plan is to ultimately use the k-means algorithm not necessarily to provide a run-over-run improvement to an ML core, but instead to produce a set of one-off clusters which represent the most positive and most negative locations of a given search topic. The points you make at the end are good ones, and I'll definitely make sure I take them into consideration when fleshing out the program.