Open corneliusroemer opened 2 years ago
I think this will be a ncov-specific bug, right? For most pipelines we infer ancestral nuc mutations via augur ancestral
and then translate those node-by-node via augur translate
. However for ncov the second step is switched out for scripts/explicit_translation.py
which uses the translations from nextalign/nextclade, and doesn't consider the output of augur ancestral
.
I think @jameshadfield is right on here about this being an ncov-specific problem. Amino acid mutations come from the translated alignments that happen from nextalign and then ancestral state reconstruction from those translations while the nucleotide mutations come from augur ancestral's inference of ancestral sequences from the nucleotide alignment. It's easy to imagine how one could get different ancestral state reconstructions from these different inputs to TreeTime.
I would transfer this issue to the ncov repo, where we could consider how to fix it in that context.
I see! So the actual augur translate
makes sure there is a link by doing the translation on reconstructed nucs, therefore only reconstructing once, while ncov
reconstructs twice, now, and that's where the link is broken.
It's worth noting that I encountered this in ncov-simple
builds. But since ncov
uses (almost) the same script, the bug should also be presented there, just unnoticed so far.
I agree that transfer makes sense then.
Current Behavior
It sometimes happens that nt and the corresponding aa mutation are not on the same branch
Expected behavior
Nucleotide and the corresponding amino acid mutations (if non-synonymous) should always be on the same branch
How to reproduce
Don't have reproducible input files yet, if anyone finds a build where this happens again please add input files (if shareable)
Possible solution
No idea, seems non-trivial as the root of the problem is that amino acids and nucleotides are reconstructed independently by treetime.
If the problem is not solvable, it would be good to explain in this issue what the preconditions are for the problem to show.
Evidence
Note that the ORF3a:78 mutation and the corresponding nt mutation at position 25624 are not on the same branch: