sars-cov-2-variants / lineage-proposals

Repository to propose and discuss lineages
43 stars 2 forks source link

EG.5.1 sublineage with ORF1b:K2557R and ORF3a:P240L first detected in Guangdong, China (80 GISAID seqs as of 2023-07-19; Asia, Europe, North and South America, Australia) #428

Closed alurqu closed 1 year ago

alurqu commented 1 year ago

There may be a EG.5.1 sublineage with ORF1b:K2557R (A21137G; NSP16:K160R), ORF3a:P240L (C26111T) first detected in Guangdong, China.

As of 2023-07-17, Cov-Spectrum reports 52 good-quality (55 total) EG.5.1+ORF1b:2557R+ORF3a:240L sequences. Source: https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?variantQuery=nextcladePangoLineage%3AEG.5.1+%26+ORF1b%3AK2557R+%26+ORF3a%3AP240L&nextcladeQcOverallScoreTo=29&

This lineage has been reported from multiple countries in all populated continents except Africa and contains NSP16:K160R which Bloom and Neher's data https://jbloomlab.github.io/SARS2-mut-fitness/ and https://github.com/jbloomlab/SARS2-mut-fitness/blob/main/results/aa_fitness/aamut_fitness_by_clade.csv shows as highly favorable in all SARS-CoV-2 clades. ORF1b:K2557R aka NSP16:K160R is also defining in lineage FL.1.5.

As of 2023-07-17, UShER shows all of the CoV-Spectrum samples are on a single subtree with evidence of additional branching: UShER_CoV-Spectrum_EG 5 1+ORF1b_2557R+ORF3a_240L To visualize on UShER: https://nextstrain.org/fetch/github.com/alurqu/pango-designation-support-alurqu/raw/main/2023/07/subtreeAuspice1_genome_CoV-Spectrum_EG.5.1%2BORF1b_2557R%2BORF3a_240L.json?c=gt-ORF3a_240&label=id%3Anode_6683112

Should this lineage be designated, there may be merit in also designating the parent lineage with ORF1b:K2557R. As seen in the UShER tree, other child lineages of the ORF1b:K2557R parent may be emerging.

GISAID query: G21718T, T22930A, A21137G, C26111T

First GISAID sequence: Guangdong, China 2023-05-01

Most Recent GISAID sequence: New South Wales, Australia 2023-07-03

A zip archive of GenBank-formatted and derived metadata and FASTA files plus CoV-Spectrum-derived UShER output files for these sequences is available at Support-EG.5.1+ORF1b_2557R+ORF3a_240L.zip

A CoV-Spectrum list of GISAID EPI ISLs for good-quality sequences is available at gisaid-epi-isl_EG.5.1+ORF1b_2557R+ORF3a_240L.txt

Edit: NSP16:K160R is defining in lineage FL.1.5 and not just FL.1.5.1.

FedeGueli commented 1 year ago

It is one of the fastest according to https://cov-spectrum.org/collections/181 although when i ushered it i found three different sibling lineages with that mutation.

alurqu commented 1 year ago

It is one of the fastest according to https://cov-spectrum.org/collections/181 although when i ushered it i found three different sibling lineages with that mutation.

I see what you mean. UShER_CoV-Spectrum_EG.5.1+ORF1b_2557R.png

By adding C26111T for ORF3a:P240L, this proposal scopes to a sublineage of one of those three (or more) lineages. It appears that the parent A21137G/ORF1b:K2557R/NSP16:K160R lineage for this specific lineage is on the S:Q52H polytomy. NSP16:K160R may be sufficiently advantageous that it is now occurring convergently in the XBB.1.9.1/FL and XBB.1.9.2/EG families. There may be wisdom in tracking the other EG.5.1+A21137G lineages for possible eventual designation. However, at this point the lineage proposed here is the largest of the set.

Note that the Bloom and Neher data also show ORF3a:P240L to be advantageous for all SARS-CoV-2 clades. The effect just isn't as strong as it is for NSP16:K160R. This may be giving a fitness boost and growth advantage to this particular EG.5.1+A21337G lineage.

alurqu commented 1 year ago

Looking deeper at the Bloom and Neher data for A21137G/ORF1b:K2557R/NSP16:K160R, for clade 22F (XBB) this is the sixth-most advantaged mutation with a delta_fitness of +3.9798. For clade 23A (XBB.1.5), which is not directly applicable but is similar as an XBB+S:486P clade, A21137G/ORF1b:K2557R/NSP16:K160R is the fifth-most advantageous mutation with a delta_fitness of +4.1452.

Also note that for both clades 22F and 23A, S:Q146K is the mutation with the highest delta_fitness in the Bloom and Neher data. That mutation already appears in a sibling of the lineage proposed here, so that may be a reason to designate a parent lineage with EG.5.1+ORF1b:K2557R on the EG.5.1 polytomy anticipating that such a parent lineage will likely see a least two child lineages designated eventually.

alurqu commented 1 year ago

The parent lineage with ORF1b:K2557R but not ORF3a:P240L may be faster than the child lineage with ORF3a:P240L.

FedeGueli commented 1 year ago

The parent lineage with ORF1b:K2557R but not ORF3a:P240L may be faster than the child lineage with ORF3a:P240L.

Please change the proposal consequently. i have the same feeling.

FedeGueli commented 1 year ago

Or better : please @alurqu propose in the main Pango repo the parental lineage with the analysis of the other lineages with the same orf1b:K2557R .

maybe @angiehinrinchs could take a look at this to check if they are all really distinct as it seems.

alurqu commented 1 year ago

Or better : please @alurqu propose in the main Pango repo the parental lineage with the analysis of the other lineages with the same orf1b:K2557R .

maybe @angiehinrinchs could take a look at this to check if they are all really distinct as it seems.

The parent lineage is now proposed in https://github.com/cov-lineages/pango-designation/issues/2117