sars-cov-2-variants / lineage-proposals

Repository to propose and discuss lineages
42 stars 2 forks source link

del27903_27910, artefact or frameshifted orf8? #215

Closed aviczhl2 closed 1 year ago

aviczhl2 commented 1 year ago

I find some 148 seqs having del27903_27910, which seems to be an artefact or frameshifted Orf8. These seqs start from 2020 and across various variants.

84 of the 148 seqs (57%) are XBB with G27915T,(Orf8:G8stop in the original orf8), which is a strong counter-evidence of the artefact assumption, as XBB+G27915T only makes up 2-3% of the GISAID database.

Is there analysis for del27903_27910 frameshifted orf8 with or without G27915T?

GISAID query: del27903_27910

EPI_ISL_1450454, EPI_ISL_2658608, EPI_ISL_2659208, EPI_ISL_2659273, EPI_ISL_2659292, EPI_ISL_2659302, EPI_ISL_2659327, EPI_ISL_2659364, EPI_ISL_2659383, EPI_ISL_2659388, EPI_ISL_2659411, EPI_ISL_2659426, EPI_ISL_2659444, EPI_ISL_2659446, EPI_ISL_2659476, EPI_ISL_2659491, EPI_ISL_2659493, EPI_ISL_2659525, EPI_ISL_2659528, EPI_ISL_2659559, EPI_ISL_2659572, EPI_ISL_2659600, EPI_ISL_2659602, EPI_ISL_2659610, EPI_ISL_2659619, EPI_ISL_2659623, EPI_ISL_2659626, EPI_ISL_2659637, EPI_ISL_2659639, EPI_ISL_2659651, EPI_ISL_4137143, EPI_ISL_4175201, EPI_ISL_4342538, EPI_ISL_4387660, EPI_ISL_4962085, EPI_ISL_5246860, EPI_ISL_5247149, EPI_ISL_5428528, EPI_ISL_6786035, EPI_ISL_6904639, EPI_ISL_7928668, EPI_ISL_8060138, EPI_ISL_8942702, EPI_ISL_10052066, EPI_ISL_10390420, EPI_ISL_10746455, EPI_ISL_11646380, EPI_ISL_11697254, EPI_ISL_11783010, EPI_ISL_11927952, EPI_ISL_12279070, EPI_ISL_13301228, EPI_ISL_13713684, EPI_ISL_13732104, EPI_ISL_13875978, EPI_ISL_14130889, EPI_ISL_16208507, EPI_ISL_16230603, EPI_ISL_16243864, EPI_ISL_16528820, EPI_ISL_16650399, EPI_ISL_16682666, EPI_ISL_16732067, EPI_ISL_16736209, EPI_ISL_16817926, EPI_ISL_16996010, EPI_ISL_17050827, EPI_ISL_17066689, EPI_ISL_17161783, EPI_ISL_17179561, EPI_ISL_17179566, EPI_ISL_17190839, EPI_ISL_17225853, EPI_ISL_17241060, EPI_ISL_17249418, EPI_ISL_17250291, EPI_ISL_17259167, EPI_ISL_17278962, EPI_ISL_17279679, EPI_ISL_17286822, EPI_ISL_17287225, EPI_ISL_17318424, EPI_ISL_17333987, EPI_ISL_17352491, EPI_ISL_17358673, EPI_ISL_17371149, EPI_ISL_17384464, EPI_ISL_17384528, EPI_ISL_17384801, EPI_ISL_17396667, EPI_ISL_17396680-17396681, EPI_ISL_17438532, EPI_ISL_17462687, EPI_ISL_17500185, EPI_ISL_17502162, EPI_ISL_17527929, EPI_ISL_17562796, EPI_ISL_17606262, EPI_ISL_17615367, EPI_ISL_17618551, EPI_ISL_17630836, EPI_ISL_17637420, EPI_ISL_17651283, EPI_ISL_17653444, EPI_ISL_17657770, EPI_ISL_17671829, EPI_ISL_17671832, EPI_ISL_17686070, EPI_ISL_17686113, EPI_ISL_17692310, EPI_ISL_17694282, EPI_ISL_17712630, EPI_ISL_17731136, EPI_ISL_17731140, EPI_ISL_17742252, EPI_ISL_17743304, EPI_ISL_17745848, EPI_ISL_17758550, EPI_ISL_17758554, EPI_ISL_17758556, EPI_ISL_17758558-17758562, EPI_ISL_17758573, EPI_ISL_17758576-17758578, EPI_ISL_17758593-17758594, EPI_ISL_17759146, EPI_ISL_17766933, EPI_ISL_17795574, EPI_ISL_17795780, EPI_ISL_17795786, EPI_ISL_17808088, EPI_ISL_17809656, EPI_ISL_17811197, EPI_ISL_17811245

FedeGueli commented 1 year ago

@ryhisner

aviczhl2 commented 1 year ago

156 seqs now.

ryhisner commented 1 year ago

I would bet that this is real. It's not overly concentrated in any one country or lab, and we know that mutations that abolish ORF8 expression have been favored over the past 15 months.

In lineages with ORF8:G8, like XBB.1.5, this deletion just moves the stop codon 16 nucleotides downstream. In lineages without ORF8:G8, it creates a stop codon in the ORF8:7-8 region.

image

.

. Nextclade calls this ∆27900-27907, which is entirely equivalent to ∆27903-27910 due to the fact that there are three T's on each side of the nucleotide range 27900-27910.

image
FedeGueli commented 1 year ago

Said that i would propose then

ryhisner commented 1 year ago

According to Usher, the largest monophyletic branch has 29 sequences. I bet that's a significant underestimate though. There are always a lot of sequencing mistakes with frameshift-causing deletions, so a lot of the sequences don't appear in searches. Also, since Usher doesn't take deletions into account, it might miss some sequences that should be clustered with this 29-sequence lineage.

However, all these sequences have G4913T (ORF1a:V1550F) and C28614T, so that could help in locating them.

image
aviczhl2 commented 1 year ago

In lineages with ORF8:G8, like XBB.1.5, this deletion just moves the stop codon 16 nucleotides downstream. In lineages without ORF8:G8, it creates a stop codon in the ORF8:7-8 region. image .

There seems to be a mistake in your frame? This frame shows the frameshift after a 7-codon deletion instead of a 8-codon one.

aviczhl2 commented 1 year ago

Said that i would propose then

Proposed a recent sublineage of GF.1 on this one. All other lineages seems have vanished. Due to cannot handle frameshifts, usher displays this as G27906A/Orf8:V5I

https://github.com/cov-lineages/pango-designation/issues/2122

ryhisner commented 1 year ago

Sorry I didn't see until just now, but @aviczhl2 is correct that my diagram above was mistaken. The next stop codon is a bit further on, around ORF8:21.

image