nextstrain / dengue

Nextstrain build for dengue virus
https://nextstrain.org/dengue
8 stars 10 forks source link

phylogenetic: Use inline root sequence #42

Closed joverlee521 closed 3 months ago

joverlee521 commented 3 months ago

Description of proposed changes

~Explicitly state that the root-sequence.json file is an expected output of the core phylogenetic workflow.~

~This also ensures that the Nextstrain automation rule deploy_all will include the root-sequence.json in the upload.~

Based on feedback from @jameshadfield in https://github.com/nextstrain/zika/pull/56#issuecomment-2058060422

Looking at the existing dataset files on S3, the 5-6 KiB root-sequence.jsons are pretty negligible when the main Auspice JSONs are 600-800 KiB. Nextstrain datasets are limited by the 500MB memory cap in Chrome,¹ so we'd be fine adding the root sequence inline.

This ensures that our uploads will include the root sequence so that they don't get out-of-sync with the main Auspice JSON.

¹ https://github.com/nextstrain/auspice/issues/1622

Related issue(s)

Follow up to https://github.com/nextstrain/dengue/pull/37 Similar to https://github.com/nextstrain/zika/pull/56

Checklist

joverlee521 commented 3 months ago

Merging since the CI run's outputs include root_sequence in the Auspice JSONs and the datasets looks good in auspice.us

joverlee521 commented 3 months ago

Manually deleted cache and triggered a re-run of the ingest-to-phylo workflow.

Once complete

joverlee521 commented 3 months ago

Removed the following files from s3://nextstrain-data/

Left the E gene root-sequence.json files since they are not being updated by this PR.