Open emmahodcroft opened 1 year ago
@emmahodcroft Let's see if this also affects web. Theoretically, web *should* use only build_name
in significant places, but there might be some funny effects in case I deviated from that. So please also watch out for strange things in web as you migrate.
I hope you don't need to change build names. If you do, then it will be a journey, because that's how the md files, URLs and other stuff is linked together.
I don't plan to change the build names, as they're used all over.
RE the nextstrain_name
-- I'll keep an eye out - I had the same thought. The main reason I am fairly confident is that it turns out a while ago I accidentally got inconsistent about the naming (started using just year-letter) and as far as I can tell I've never noticed any impact of this. This is the main thing that made me confident that we must not be using if anywhere, or I'd have noticed whenever I first started messing it up (probably about a year ago now) or sometime in between.
But agree - cant' be too careful!
I do not expect to change the build_name
- totally agree.
The only other thing that might be worth exploring changing is display_name
as perhaps we'd like to move to something a bit more flexible (perhaps including the pango in some cases, as Nextstrain is somewhat moving to do?). But I'd want to do a separate scope to check how much this is used.
Nextclade now breaks down Nextstrain clades into year-letter and WHO, and only gives the "old" 'full' name in a new column,
clade_legacy
.Example: Old:
clade_nextstrain
==22F (Omicron)
New:
clade_nextstrain
==22F
clade_who
==Omicron
clade_legacy
==22F (Omicron)
This doesn't directly impact CoVariants as we don't use the Nextclade file directly, but the
metadata.tsv
that comes after thencov-ingest
workflow. Currently this hasn't changed, but it may change either by just replacingNextstrain_clade
(which we use) with the shortened name, or by doing this and also adding a "legacy" column.For clarity, we currently compare values in
Nextstrain_clade
withdisplay_name
fromclusters.py
(containing things like22F (Omicron)
)If a legacy column is added, switching is as simple as just using this new column, with the rest of the code remaining the same. If there isn't one, or we want to be more future-proof, we should ensure we can just use a different entry in
clusters.py
which has the year-letter name.We currently have an entry
nextstrain_name
, but this has been used inconsistently - sometimes with the 'full' name (21L (Omicron)
) and sometimes just year-letter (22A
). To help us switch to that option more easily in future, I propose switching now so that allnextstrain_name
entries are year-letter.This should mean that in future, we would need to switch from using
display_name
incluster_analysis.py
to usingnextstrain_name
. This shouldn't be too bad but will need checking as it's a little more complex than I thought.If this is the path we go, here's a small checklist:
nextstrain_name
to use year-lettercluster_analysis.py
to usenextstrain_name
instead ofdisplay_name
- and check it works.Clearly, all of the above is only relevant to clades we track that are official Nextstrain clades. For those that aren't official (mostly older ones), we use Pango or SNPs, so this is unchanged.