reichlab / variant-nowcast-hub

A repository to store COVID-19 variant nowcasts collected as a modeling hub.
MIT License
4 stars 1 forks source link

Update weekly clade list process to save the corresponding Nextstrain metadata_version #41

Open bsweger opened 4 days ago

bsweger commented 4 days ago

Background

When generating and saving a list of clades to model, we should also get and save information that would be required to get the reference tree in use at the time (for reproducibility).

Definition of done

bsweger commented 4 days ago

I believe the output we want here is Nextstrain metadata_version information, which looks like this:

{
   "schema_version":"v1",
   "nextclade_version":"nextclade 3.8.2",
   "nextclade_dataset_name":"SARS-CoV-2",
   "nextclade_dataset_version":"2024-07-17--12-57-03Z",
   "nextclade_tsv_sha256sum":"5dd39f2291c890d5adbcc18c513016bfd0dab8d7bffc581b7e4c0bdf4306bb89",
   "metadata_tsv_sha256sum":"55f6768787193f309fb08f47270143fca7f126bc6fc38db910ad7d75ffeeba01"
}

The nextclade_dataset_version is what we need to request the corresponding reference tree.

When creating the clade list, it's likely sufficient to capture the current metadata_version.json.