nextstrain / nextstrain.org

The Nextstrain website
https://nextstrain.org
GNU Affero General Public License v3.0
87 stars 49 forks source link

List datasets + files from core + staging sources #700

Closed jameshadfield closed 3 months ago

jameshadfield commented 12 months ago

This is a pretty major PR and most detail is in commit messages + comments in the code. A high level overview of the functionality introduced:

As is the case with feature pushes of this scope there is no clear end point. There are a huge number of potential improvements we can make, so for this PR please indicate if you consider something blocking vs an improvement we can tackle in subsequent PRs 🙏

From my point of view, the following needs to be done before merge (in the same vein as how I approached the prototype, I'm trying to share work at an early stage):

I'll make some notes via in-line comments about feature pushes I don't think are blocking here, but which come up as a result of this work, so that discussions can be threaded.

Testing

It should 🤞 all work via review apps. To test locally, you can avoid the S3 API calls by creating a (git-ignored) ./devData folder and adding a manifest+inventory for each bucket with the following filenames:

./devData/core.manifest.json          ./devData/core.inventory.csv.gz
./devData/staging.manifest.json       ./devData/staging.inventory.csv.gz

(You can pick any day's inventory, that's not so important for dev purposes.) Then run the server with a LOCAL_INVENTORY=true environment variable.

P.S. manifest JSON here refers to the S3 inventory manifest and is completely unrelated to our existing usage of manifest JSONs. However this work should eventually allow us to remove those manifests. So that's nice.

tsibley commented 12 months ago

Mostly a note to self. Things to make sure to review here based on the Blab Nextstrain meeting just now:

jameshadfield commented 3 months ago

Replaced by #803 and #719