mispy-archive / twitter_ebooks

Better twitterbots for all your friends~
MIT License
972 stars 140 forks source link

Question about jsonify command #116

Closed tape- closed 8 years ago

tape- commented 8 years ago

I downloaded my tweets.csv today and used ebooks jsonify to convert it to json for easier archive-ing in the future. I noticed that the resulting .json file has only the "text" and "id" fields for each tweet, whereas the tweets downloaded with ebooks archive contain signifiantly more metadata.

I take it from this that all archive really needs is the "id" field to keep from downloading duplicate tweets and adding them to the corpus file, is that correct?

negatendo commented 8 years ago

Yes, archive uses the last appearing id in the json file for the query it makes to Twitter to fetch newer tweets.

https://github.com/mispy/twitter_ebooks/blob/0733453a6f1d2c3c32bdf9529c216ccdae5a71cf/lib/twitter_ebooks/archive.rb#L91

Also see since_id here https://dev.twitter.com/rest/reference/get/search/tweets