sars-cov-2-variants / lineage-proposals

Repository to propose and discuss lineages
42 stars 1 forks source link

MD.1 (FLiRT) + S:F58Y,S:S59L (3) emerged in Philippines #1671

Open FedeGueli opened 5 days ago

FedeGueli commented 5 days ago

transferred back from https://github.com/cov-lineages/pango-designation/issues/2680

In the batch uploaded today from Philippines i noticed a weird a double AA mutation in the spike in a row. Being in an area under pressure for convergent evolution and sampled in a very undersampled are better to track it for a bit

Defi ning mutations: MD.1 >> ORF1a:Y182H (T809C), T10999C, S:F58Y (T21735A), S:F59L (T21737C),G28798A Query:T809C, T10999C, Samples: 3 ( 4 on Usher from England, US, Philippines and Australia) IDs:EPI_ISL_19096194, EPI_ISL_19180401, EPI_ISL_19227260, Tree:

Screenshot 2024-07-02 alle 11 54 14

https://nextstrain.org/fetch/genome-test.gi.ucsc.edu/trash/ct/subtreeAuspice1_genome_test_134e9_3caa70.json?c=gt-S_58,59&gmax=25384&gmin=21563&label=id:node_7159421

FedeGueli commented 5 days ago

@ryhisner spotted a quite long insertion at S:215 it is Spike_ins215AGERWR (Query)

ryhisner commented 5 days ago

@ryhisner spotted a quite long insertion at S:215 it is Spike_ins215AGERWR (Query)

Yeah, it's in the sequence from the Philippines. Sometimes sequences from the Philippines are unreliable, but I think this insertion is real for a few reasons:

  1. It's in a location known for insertions.
  2. Though there is dropout elsewhere in the sequence, there is no dropout in the neighborhood of this insertion.
  3. The insertion is loaded with guanines. 10/18 nucs are G's here. I've noticed that, despite G being much less common than A and T throughout the genome, it is enormously overrepresented in insertions. I still don't know the reason for this. Perhaps it's a well-known principle that I'm just not aware of. My only idea is that because G can form strong bonds with C in the secondary RNA structure (stronger than A-T bonds due to having three hydrogen bonds instead of two) and because it can also form non-Watson-Crick bonds with T, which is the most common nucleotide in the viral genome, insertions that are full of G's may, on average, better preserve secondary RNA structure.

The exact insertion, according to both GISAID and Nextclade, is S:ins215_AGERWR, or ins22207_GCTGGAGAGAGATGGCGG.

image

Incidentally, another insertion that appeared in two KP.2.3 sequences today (and in one or two earlier sequences) is S:ins182_ERA, and sure enough, G's are overrepresented here as well, even in the same ratio as the above insertion, with 5/9 nucs being G's.

Here the insertion is ins22108_GAGAGAGCG.

image
FedeGueli commented 5 days ago

Thank you @ryhisner