Closed MoiraZuber closed 2 years ago
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Updated |
---|---|---|---|
covariants | ✅ Ready (Inspect) | Visit Preview | Oct 26, 2022 at 0:14AM (UTC) |
Thank you for this Moira & apologies that I'm just getting to it now!
I think this sounds very good to me and the logic makes sense - I think it's also nice that it makes it a little clearer about what's happening, hopefully to avoid confusion!
Currently, no warning is printed if none of these methods are possible. Should we print a warning if this is the case or will this never happen anyway?
Would a warnings like "no sequences were assigned to this cluster" (as I've seen sometimes, for example if you have nonsensical SNPs) be printed, or none at all? It might be nice to have a similar warning so that the user can see this rather than it failing 'silently'. I think it would print the above (because it should still be a clus
in clus_to_run
that's checked at the end, right?), but I might be misinterpreting?
The script currently only checks if
use_pango=True
, but not if the pango_lineages given inclusters[clus]["pango_lineages"]
are real. This would include a separate pass over the metadata to collect all possible pango_lineages in order to compare them. (This is currently already done to find all "legal" Nextstrain_clades). Would this be desired?
I think we'd be ok without this - this should trigger the "no sequences assigned to this cluster" warning, and I think that's enough for the users to re-evaluate what they've put in as definitions (whatever they may be!), including check themselves if they're real Pango lineages!
I'm happy for this to merge in whenever, as long as you also think it's good to go?
Would a warnings like "no sequences were assigned to this cluster" (as I've seen sometimes, for example if you have nonsensical SNPs) be printed, or none at all?
Yes it would. If that's enough warning to realize something isn't right, then we can leave it like that.
I think in that case we're ready to merge!
Fantastic, I'll merge this in now and it should go in today's update! Thanks again @MoiraZuber !
For a given cluster, it is currently possible that both pango lineages and snps could be used to assign sequences to it. With this change, only one of these methods will be used for each cluster:
display_name
that is found in the metadata "Nextstrain_clade" column? -> use Nextstrain_clade if not:use_pango=True
enabled? -> use pango lineages if not:snps
entry? -> use snps if not:Thoughts:
use_pango=True
, but not if the pango_lineages given inclusters[clus]["pango_lineages"]
are real. This would include a separate pass over the metadata to collect all possible pango_lineages in order to compare them. (This is currently already done to find all "legal" Nextstrain_clades). Would this be desired?