gwu-libraries / TweetSets

Service for creating Twitter datasets for research and archiving.
MIT License
25 stars 2 forks source link

cat gzipped JSON files copied from SFM for full datasets #152

Closed lwrubel closed 2 years ago

lwrubel commented 3 years ago

As mentioned in #140, copy JSON files (rather than loading and then extracting) for placement in full datasets directory. To reduce the number of files, consider concatenating files by day.

Naming convention for SFM data files starts with harvestid-datestamp.

Example SFM filenames:

b6278097654949afb358090ad3f9e65b-20210810223604415-00000-aj5pi7yk.json.gz
b6278097654949afb358090ad3f9e65b-20210810230605213-00000-6twlks3m.json.gz
b6278097654949afb358090ad3f9e65b-20210810233604899-00000-4dgz7mbn.json.gz
b6278097654949afb358090ad3f9e65b-20210811000604745-00000-82tuoyaf.json.gz
b6278097654949afb358090ad3f9e65b-20210811003603403-00000-5fw6gqry.json.gz
b6278097654949afb358090ad3f9e65b-20210811010605812-00000-p34nh2ul.json.gz

and

41ab2853b30b4435ba848da1597cb5b4-20161109100401050-00000-25267-803eb0940c39-8001.json.gz
41ab2853b30b4435ba848da1597cb5b4-20161109103401018-00000-25318-803eb0940c39-8001.json.gz
41ab2853b30b4435ba848da1597cb5b4-20161109110400654-00000-25369-803eb0940c39-8001.json.gz
41ab2853b30b4435ba848da1597cb5b4-20161109113401019-00000-25420-803eb0940c39-8001.json.gz
41ab2853b30b4435ba848da1597cb5b4-20161109120401372-00000-25472-803eb0940c39-8001.json.gz
41ab2853b30b4435ba848da1597cb5b4-20161109123401194-00000-25525-803eb0940c39-8001.json.gz
41ab2853b30b4435ba848da1597cb5b4-20161109130006128-00000-25576-803eb0940c39-8001.json.gz
dolsysmith commented 2 years ago

Closed by PR #154