nextstrain / mpox

Nextstrain build for mpox virus
https://nextstrain.org/mpox
MIT License
39 stars 16 forks source link

Rename instances of "monkeypox" to "mpox" #220

Closed trvrb closed 8 months ago

trvrb commented 8 months ago

This commit goes through the repo and basically does a find-and-replace of "monkeypox" to "mpox, except for a handful of locations like the #monkeypox-updates Slack notification.

This shouldn't change appearance of build outputs except for:

This has the side effect that intermediate files will begin to propagate to https://data.nextstrain.org/files/workflows/mpox/sequences.fasta.xz, etc... rather than /files/workflows/monkeypox. This may affect users who've been relying on the current URLs. Nothing should break for these users as the old files will continue to exist but they will no longer receive updated data. I think this is an okay trade-off in terms of downstream impacts.

corneliusroemer commented 8 months ago

If people request from data.nextstrain.org we could be able to do URL forwarding, couldn't we? Wouldn't that be better than serving stale data?

Otherwise, for those requesting directly from AWS not via data.nextstrain.org, we have a choice between serving stale data or deleting the old file so that workflows error - each have advantages and disadvantages. Might be worth doing a code search on Github to see who uses the data and ping them.

joverlee521 commented 8 months ago

except for a handful of locations like the #monkeypox-updates Slack notification.

Addressing this separately in #221

corneliusroemer commented 8 months ago

These are all the repos that Github could find using https://data.nextstrain.org/files/workflows/monkeypox that could benefit from URL redirects (17, with a handful of non-Nextstrain repos): https://github.com/search?q=data.nextstrain.org%2Ffiles%2Fworkflows%2Fmonkeypox&type=code

These are the repos that use aws s3 paths that can't be redirected (10 mostly ours): https://github.com/search?q=s3%3A%2F%2Fnextstrain-data%2Ffiles%2Fworkflows%2Fmonkeypox&type=code

joverlee521 commented 8 months ago

If people request from data.nextstrain.org we could be able to do URL forwarding, couldn't we?

Yes, definitely possible. Not sure if we need to do it on the Cloudfront side but it's possible to configure S3 website redirections. Also looks like it's possible to redirect requests for an S3 object

Edit: The S3 object redirect is via x-amz-website-redirect-location, which I think is only for setting the website redirection and not for direct S3 object access.

trvrb commented 8 months ago

The fetch and ingest workflow completed successfully. However, the artifacts ended up at s3://nextstrain-datafiles/workflows/mpox/branch/rename-to-mpox/. However, it looks good besides this. I'm assuming the best course of action is to merge this PR when it passes review and then to run fetch and ingest again from main so that the files are placed in the proper s3 location.

trvrb commented 8 months ago

Thanks @victorlin for the fix with commit b14d6fd147627ec8e34f4c84aeb987dcbb5a2838. I'm going to go ahead and merge this now.

trvrb commented 8 months ago

Fetch and ingest is running from master here: https://github.com/nextstrain/mpox/actions/runs/6724598765 and should update S3 files to specify s3://nextstrain-datafiles/workflows/mpox/. I'll monitor and update here when complete.

joverlee521 commented 8 months ago

We discussed redirection rules for monkeypox during today's dev chat and I added @tsibley 's suggested redirection rules to the nextstrain-data S3 bucket.

So now access through data.nextstrain.org should redirect to mpox

$ curl -I https://data.nextstrain.org/files/workflows/monkeypox/metadata.tsv.gz
HTTP/2 301 
content-length: 0
location: https://data.nextstrain.org/files/workflows/mpox/metadata.tsv.gz
date: Wed, 01 Nov 2023 23:55:24 GMT
server: AmazonS3
x-cache: Miss from cloudfront
via: 1.1 ac3f0425be668a2439884bb8cbd3ccd8.cloudfront.net (CloudFront)
x-amz-cf-pop: SFO53-C1
x-amz-cf-id: 14QBYLPsKkmgKazVcOsSE7kPxM8NRaCzXSOe-rcV6odttMGDfwU0EA==

Redirection for the s3:// paths are not possible. However, based on Cornelius's search, the s3:// paths are only used for our internal uploads and slack notifications.

trvrb commented 8 months ago

Perfect! Thank you @joverlee521.

I can confirm that s3://nextstrain-data/files/workflows/mpox/metadata.tsv.gz and hence https://data.nextstrain.org/files/workflows/mpox/metadata.tsv.gz now exists as expected after running ingest above.

I deleted the monkeypox/ directory in files/workflows/ to reduce confusion. And as expected from your post above, https://data.nextstrain.org/files/workflows/monkeypox/metadata.tsv.gz works to retrieve the mpox/ equivalent.

I believe the next step here is to confirm that a phylogenetics rebuild works as expected. I'll kick this off now.

trvrb commented 8 months ago

And actually... this testing already happened with the automatic updates. Earlier today actions including https://github.com/nextstrain/mpox/actions/runs/6735424381 ran and produced updated Auspice JSONs like mpox_all-clades.json. I believe we're set in this case. The corresponding issue https://github.com/nextstrain/mpox/issues/168 is now closed.