vchi90 / orange_skwad

They're not oranges... they're blood oranges.
0 stars 0 forks source link

rev_B by cerealKillers #8

Open rachel-ng opened 5 years ago

rachel-ng commented 5 years ago

The aim of your project is clear, but I'm rather curious about your methodology.
How exactly are you planning on getting all possible next words?
You say that you'll be using word counts to get the size (and placement I'm assuming) of each word on the wordmap, and you're mostly likely going to be first analyzing the tweets, but how are you planning on keeping track of which words come after each other?
Also how many of his tweets are you planning on using? The full archive seems to have about 37k tweets, which seems like it'd be quite a lot of data to store and analyze.

jiayang commented 5 years ago

We will go through all the words from his tweets, and generate a Markov chain to keep track of the probabilities.

Yes, we plan on using all the tweets because we want to have as-accurate-as-possible data.