Open MokeEire opened 3 years ago
Hi, thanks for the interest.
You can try this link: https://archive.org/details/twitterstream
To collect the archive data for other years, one needs to call wget for every available link. I do not have all of them somewhere right now, so I keep this issue open, compile and upload them later.
In any case, collecting the whole archive will be quite costly (it's 7 tb) and time consuming, it might be easier to get the data from Twitter using ids unless you are interested in the tweets of suspended users.
I would love to be able to get the data using IDs (presumably using the IDs in the tweets.csv
file?). Currently trying to find a way to do just that with R.
Unfortunately I do not know R but you theoretically need to do is to load the ids into a list and then feed the api endpoint statuses/lookup with this list. Any twitter api wrapper should have a function that calls / wraps "statuses/lookup".
The readme links to two places for the archive data:
The second links to a tar file which I could download, but the first links to a blank screen (unsure if we are supposed to replace parts of the URL like year). In either case, how would one collect the archive data for other years?