I downloaded my tweets.csv today and used ebooks jsonify to convert it to json for easier archive-ing in the future. I noticed that the resulting .json file has only the "text" and "id" fields for each tweet, whereas the tweets downloaded with ebooks archive contain signifiantly more metadata.
I take it from this that all archive really needs is the "id" field to keep from downloading duplicate tweets and adding them to the corpus file, is that correct?
I downloaded my tweets.csv today and used
ebooks jsonify
to convert it to json for easierarchive
-ing in the future. I noticed that the resulting .json file has only the "text" and "id" fields for each tweet, whereas the tweets downloaded withebooks archive
contain signifiantly more metadata.I take it from this that all
archive
really needs is the "id" field to keep from downloading duplicate tweets and adding them to the corpus file, is that correct?