Closed corneliusroemer closed 1 year ago
@joverlee521 Every time I step away from this issue I have to go back and try to remember it all again (which I haven't done this morning) but my memory is that hopefully these steps are unwrapping these circular dependencies...
Nextclade now no longer knows about the legacy clade names from Nextstrain & thus doesn't output them. But due to that, we have to add them back into Nextstrain metadata to support legacy use (at least until such time as we decide to stop supporting that, with some warning or etc). This should mean that future changes to Nextclade should be more independent - and same on Nextstrain end as far as how we want to rename clades or map them to various labels, etc....
That's the plan anyway, but I'm a bit lost on what step we're on ATM TBH 🤷 !
@emmahodcroft your summary is entirely correct. Legacy definitions are now an ingest-only concern for backwards compatibility.
open/gisaid
x wuhan/21L
Description of proposed changes
This PR fixes ingest to work with the latest Nextclade dataset release: at the moment it fails due to
clade_legacy
no longer being output by Nextclade starting with dataset release2023-06-16
.(This is due to https://github.com/neherlab/nextclade_data_workflows/pull/42 which in turn was triggered by a refactor in ncov of how we annotate clades https://github.com/nextstrain/ncov/pull/1065.)
So as not to break downstream workflows that rely on ingest output
metadata.tsv
havingclade_legacy
, this PR adds aclade_legacy
column tometadata.tsv
The values are defined as a simple mapping from
clade_nextstrain
(year-letter, e.g. 22F) toclade_legacy
indefaults/clade-legacy-mapping.yml
This file lives in ingest for now to make this PR work without requiring changes to
ncov
.Testing
snakemake -c all --configfile config/debug_sample_gisaid.yaml -p --ri -F
open
,gisaid
was still running)s3://nextstrain-staging/files/ncov/open/branch/cornelius/add-legacy-clade-names
Future work
clade_legacy
so that new clade doesn't require changes in ingest. E.g., we could use theWHO
column, or simply add(Omicron)
, or add nothing at all, i.e. map to itself. Tracked in https://github.com/nextstrain/ncov-ingest/issues/406