gwu-libraries / TweetSets

Service for creating Twitter datasets for research and archiving.
MIT License
25 stars 2 forks source link

Remove directory from displayed filenames on dataset extracts page #156

Open lwrubel opened 2 years ago

lwrubel commented 2 years ago

Remove tweet-json/, tweet-csv/ etc from the displayed filename,

For example, show tweet-20210730.jsonl.gz instead of tweet-json/tweet-20210730.jsonl.gz

Also look into putting nodes, edges, csv as a prefix on the file to allow user to identify filetype once downloaded, for example: mentions-nodes-00000-937227a7-46ed-4bcb-95ae-bcfba347357d-c000.csv.gz

Renaming appears to need to happen outside of Spark (e.g., using shutil or os).