nextstrain / nextstrain.org

The Nextstrain website
https://nextstrain.org
GNU Affero General Public License v3.0
87 stars 49 forks source link

Remove s3 listings #694

Closed jameshadfield closed 1 year ago

jameshadfield commented 1 year ago

See commit messages for rational.

This would close #672.

Note that the seasonal-flu strain search JSON was being updated from elsewhere, but this hasn't been updated since 2023-03-24 so perhaps that's been removed recently?

P.S. The removal of these resources was indicated in an internal google doc from March 2022 titled "How nextstrain.org lists (collects) datasets and narratives"

joverlee521 commented 1 year ago

Note that the seasonal-flu strain search JSON was being updated from elsewhere, but this hasn't been updated since 2023-03-24 so perhaps that's been removed recently?

I was updating this manually with the weekly flu builds, but I stopped updating it since the collect-search-results script was returning empty results.


Should we also remove the link to the strain search page from the /influenza page?

Screen Shot 2023-05-30 at 11 32 54 AM

jameshadfield commented 1 year ago

Should we also remove the link to the strain search page from the /influenza page?

Good find! Updated

tsibley commented 1 year ago

Any objections to removing the files themselves from S3?

s3://nextstrain-data/datasets_influenza.json
s3://nextstrain-staging/datasets_staging.json
s3://nextstrain-data/search_sars-cov-2.json
s3://nextstrain-data/search_seasonal-flu.json

All still exist.

tsibley commented 1 year ago

Canary should be promoted first, I guess. And should check that the pages which weren't removed, just bannered, don't completely blow up when their JSON files 404.

tsibley commented 1 year ago

When I mock the files being removed and the search pages get a 404, they produce this UI:

image

Seems fine. (I'd have probably 404'd the search pages entirely, but that's separate…)

tsibley commented 1 year ago

Alternatively, we could also do this to avoid the error and also avoid the useless request without otherwise changing the page code (which seems to be desired?).

diff --git a/static-site/search_pages.json b/static-site/search_pages.json
index ff1b7e1..1992d9f 100644
--- a/static-site/search_pages.json
+++ b/static-site/search_pages.json
@@ -2,11 +2,11 @@
   {
     "urlName": "sars-cov-2",
     "displayName": "SARS-CoV-2",
-    "jsonUrl": "https://data.nextstrain.org/search_sars-cov-2.json"
+    "jsonUrl": "data:application/json,{\"datasets\":[],\"strainMap\":{},\"dateUpdated\":null,\"exclusions\":{}}"
   },
   {
     "urlName": "seasonal-flu",
     "displayName": "seasonal influenza (H1N1, H3N2, B/Vic & B/Yam)",
-    "jsonUrl": "https://data.nextstrain.org/search_seasonal-flu.json"
+    "jsonUrl": "data:application/json,{\"datasets\":[],\"strainMap\":{},\"dateUpdated\":null,\"exclusions\":{}}"
   }
 ]

Is there a reason we're not removing the UI for these pages entirely?

Also, I note that /search doesn't have the "no longer maintained" banner on it.

image

tsibley commented 1 year ago

Promoted canary and deleted the files:

$ parallel -v aws s3 rm 
parallel: Warning: Input is read from the terminal. You either know what you
parallel: Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
parallel: Warning: ::: or :::: or to pipe data into parallel. If so
parallel: Warning: consider going through the tutorial: man parallel_tutorial
parallel: Warning: Press CTRL-D to exit.
s3://nextstrain-data/datasets_influenza.json
s3://nextstrain-staging/datasets_staging.json
s3://nextstrain-data/search_sars-cov-2.json
s3://nextstrain-data/search_seasonal-flu.json
aws s3 rm s3://nextstrain-data/search_sars-cov-2.json
delete: s3://nextstrain-data/search_sars-cov-2.json
aws s3 rm s3://nextstrain-data/search_seasonal-flu.json
delete: s3://nextstrain-data/search_seasonal-flu.json
aws s3 rm s3://nextstrain-data/datasets_influenza.json
delete: s3://nextstrain-data/datasets_influenza.json
aws s3 rm s3://nextstrain-staging/datasets_staging.json
delete: s3://nextstrain-staging/datasets_staging.json