We should automate the process of producing the "All GEO PEPs" downloadable tarball, and have an archive of these. TODO:
[ ] Set up a schedule (maybe in geopephub?) that produces this tar archive. Maybe it should be quarterly?
[ ] Push the tar archive to S3 somewhere.
[ ] Create a dedicated page on PEPhub that explains the dataset, and has links on how to download it.
[ ] The PEPhub page should automatically update when a new quarterly release happens
[x] Get rid of the 'pephub' and 'pephub_geo.tar' clutter in /project/shefflab/processed -- these should be managed in a consistent way so they just get pushed to the right place, maybe stored in the deployment folder or something.
We produced a tarball of all the GEO peps in January 2024. This is useful, but it's quickly outdated, as new data is added to GEO.
We're automatically pulling that new data into PEPhub via https://github.com/pepkit/geopephub
We should automate the process of producing the "All GEO PEPs" downloadable tarball, and have an archive of these. TODO:
/project/shefflab/processed
-- these should be managed in a consistent way so they just get pushed to the right place, maybe stored in thedeployment
folder or something.