nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 129 forks source link

The amino acid wild type in aa_muts.json is not consistent with the reference sequence. #1188

Closed wxsh1213 closed 1 year ago

wxsh1213 commented 1 year ago

Hi, I ran the processing pipeline in flu sequences. But I found out that the wild-type amino acid outputted in the aa_muts.json file is not consistent with the wild-type in the reference protein sequence. For example, one aa mutation is K148N, but the wild type at position 148 is R in the reference sequence. I was wondering what reference is used when generating the amino acid mutations.

The version of augur is 21.1.0.

Thank you so much for your help and looking forward to your reply!

joverlee521 commented 1 year ago

Hi @wxsh1213,

If the aa_muts.json file is produced by augur translate, then the listed mutations for each node are changes compared to their parent node in the tree, not the reference sequence.

Based on your example of K148N, I think you are looking at H7N9 sequences. You can see in the Nextstrain tree (https://nextstrain.org/flu/avian/h7n9/ha?c=gt-HA_148) that there is first a R148K mutation then later a K148N mutation.

I hope the visualization helps clarifies things!