Open gombru opened 6 years ago
Hi, I know I arrive a bit late to this post, but just in case anyone might be interesting in collecting new tweets and clean them, I can share with you this python code. The cleaning code is in a jupyter notebook because I wanted to show and test each step visually. Hope you find it useful :)
I wanted to use the Twitter preprocessing script in https://nlp.stanford.edu/projects/glove/preprocess-twitter.rb and found a few bugs there:
I think the script has not been tested, and probably is nto the one that was used to train the model, as discussed here https://groups.google.com/forum/#!searchin/globalvectors/preprocessing|sort:date/globalvectors/_X7hQBBuoLY/2ysMo1sWCQAJ
It's my first touch with Ruby but I've fixed those two bugs:
` def tokenize input
end
puts tokenize($_) `