outbreak-info / outbreak.info

During outbreaks of emerging diseases such as COVID-19, efficiently collecting, sharing, and integrating data is critical to scientific research. outbreak.info is a resource to aggregate all this information into a single location.
https://outbreak.info/
GNU General Public License v3.0
33 stars 13 forks source link

Reporting of double deletion in Spike protein for B.1.617.2 sequences inconsistent with other resources #434

Open matt-sd-watson opened 3 years ago

matt-sd-watson commented 3 years ago

Hello, The website reports the coordinates of the double deletion in the Spike protein for delta (B.1.617.2 sequences) as amino acids 157 and 158 in the Spike: https://outbreak.info/situation-reports?pango=B.1.617.2

However, additional resources report these coordinates as being instead 156 and 157, including:

I am wondering how the discrepancy between the coordinates of these deletions in Delta sequences is occurring between this tool and the others that I have listed here.

delta_sequences_nextclade_web_interface

mindoftea commented 3 years ago

Hi there, thanks for bringing this up.

The two representations are actually equivalent -- in context, outbreak.info represents this mutation as E156G + del157/158, whereas others represent it as del156/157 + R158G. In both cases the meaning is that all three amino acids are replaced by a single glycine.

In general, when an out-of-frame deletion of length 3*k occurs, it will reduce k+1 codons into 1, which codes for a new AA. The ambiguity arises because we consider the first effected codon to have mutated and the last one to have been deleted, whereas others take the opposite approach.

Because there seems to be a consensus among other platforms, we plan to eventually modify our pipeline to follow this convention.

matt-sd-watson commented 3 years ago

Hi,

Thanks for the clarification! I had assumed that it was probably due to the inherent ambiguity of the out of frame deletion for this variant and how to represent the final amino acid, as opposed to a simple website typo, but I wanted to confirm. I will let you decide how to close this issue if you will update the resource or not to match the others that I have posted.

flaneuse commented 3 years ago

@matt-sd-watson we're going to leave it open for now and eventually shift to the nomenclature that Nextstrain is using to make it less confusing to compare.