nextstrain / cli

The Nextstrain command-line interface (CLI)—a program called nextstrain—which aims to provide a consistent way to run and visualize pathogen builds and access Nextstrain components like Augur and Auspice across computing environments such as Docker, Conda, and AWS Batch.
https://docs.nextstrain.org/projects/cli/
MIT License
27 stars 20 forks source link

AWS Batch setup instructions do not allow access to intermediate artifacts in s3://nextstrain-data/ bucket #170

Open sacundim opened 2 years ago

sacundim commented 2 years ago

The current version of the instructions for setting up AWS Batch instructs people to create three IAM policies, but none of the three grants s3:ListBucket and s3:GetObject access to the s3://nextstrain-data/ bucket that the ncov Open build uses for intermediate GenBank artifacts. This means that people who attempt to run a build on Batch modeled after that one will experience errors like I did in this ticket:

For an example IAM policy that grants access to that bucket, see:

tsibley commented 2 years ago

Agreed we should adjust the example policy in those instructions to grant to nextstrain-data and add explanation of why/when its useful, noting that it's technically optional. Not all Batch setups will need it, but we will be extending other core pathogen builds to use a similar input data file pattern so good to include it earlier than later.

Background context here is that the example policy in these instructions long predates the ncov build and its data files on s3://nextstrain-data. The policy also doesn't assume any particular build is being run, but since the ncov build and its input data is so widely-used it'd still be good to add grant/mention now.