Closed emmahodcroft closed 1 year ago
Thanks for noticing my oversight in adding recombinant
to the mapping. I could have noticed easily using:
zstdcat metadata.tsv.zst | tsv-summarize -H --count -g Nextstrain_clade
Which shows that there are some ?
there:
There's going to be still some ?
left which is for sequences for which alignment fails. That might be a change, previously this might have been empty string ``. That should be ok for covariants?
This doesn't require full reruns, as the clade mapping happens on the full nextclade.tsv
.
So in <24hr you should have the recombinant clade in metadata @emmahodcroft
rule generate_metadata:
input:
nextclade_tsv=f"data/{database}/nextclade.tsv",
nextclade_21L_tsv=f"data/{database}/nextclade_21L.tsv",
existing_metadata=f"data/{database}/metadata_transformed.tsv",
clade_legacy_mapping="defaults/clade-legacy-mapping.yml",
output:
metadata=f"data/{database}/metadata.tsv",
benchmark:
f"benchmarks/generate_metadata_{database}.txt"
shell:
"""
./bin/join-metadata-and-clades \
--metadata {input.existing_metadata} \
--nextclade-tsv {input.nextclade_tsv} \
--nextclade-21L-tsv {input.nextclade_21L_tsv} \
--clade-legacy-mapping {input.clade_legacy_mapping} \
-o {output.metadata}
"""
Thanks @corneliusroemer appreciate you looking this over!
Yes, I think things that get no call should be fine whether ` or
?` - I only look for Nextclade calls that match the clades I track (so I also ignore 20A etc), and everything else I check for SNPs to assign myself.
One possibly clade output from Nextclade is
recombinant
. However, in the move to change the clade names, and remap them, this has gotten lost.I use this in CoVariants so it would be great if we can add
recombinant
back in. I think this is as simple as one-to-one mapping, but I would appreciate if someone could run a test to check this does work as expected.