sars-cov-2-variants / lineage-proposals

Repository to propose and discuss lineages
43 stars 2 forks source link

XBB.2.3.8 sublineage with ORF1a:I2786V, ORF3a:I35T, ORF7a:T14I, S:T259I, and C8950T first detected in Nevada, USA (33 GISAID seqs as of 2023-08-22; USA) #652

Closed alurqu closed 1 year ago

alurqu commented 1 year ago

There may be a XBB.2.3.8 sublineage with ORF1a:I2786V (A8621G; NSP4:I23V), ORF3a:I35T (T25496C), ORF7a:T14I (C27434T), S:T259I (C22338T), and synonymous nucleotide mutation C8950T first detected in Nevada, USA.

There may also be merit in designating one or more levels between XBB.2.3.8 and this sublineage. This is especially true of XBB.2.3.8+ORF1a:I2786V from which the zoomed-out UShER tree below shows several possible emerging sublineages.

As of 2023-08-19, Cov-Spectrum reports 25 good-quality (26 total) XBB.2.3.8+ORF1a:2786V+ORF3a:35T+ORF7a:14I+S:259I+8950T sequences. Source: https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?variantQuery=nextcladePangoLineage%3AXBB.2.3.8+%26+ORF1a%3AI2786V+%26+ORF3a%3AI35T+%26+ORF7a%3AT14I+%26+S%3AT259I+%26+C8950T&nextcladeQcOverallScoreTo=29&

This lineage is limited to the USA but, while there really are not yet enough samples for a good estimate, it is showing a possible strong growth advantage in the USA. Within the USA, this lineage has been reported from 9 states: Alaska, Arizona, California, Colorado, Louisiana, Nevada, Ohio, Texas, and Washington State.

Now to consider the clade-specific Bloom and Neher estimates (from https://github.com/jbloomlab/SARS2-mut-fitness/blob/main/results/aa_fitness/aamut_fitness_by_clade.csv) of the fitness effects of the non-synonymous mutations in their order on the UShER tree:

For ORF1a:I2786V (NSP4:I24V),

clade,gene,aa_mutation,delta_fitness 20A,ORF1ab,I2786V,-0.20517 20B,ORF1ab,I2786V,-0.064288 20C,ORF1ab,I2786V,0.14563 20E,ORF1ab,I2786V,0.062315 20G,ORF1ab,I2786V,-1.1536 20I,ORF1ab,I2786V,0.57584 21C,ORF1ab,I2786V,-0.65669 21I,ORF1ab,I2786V,0.46497 21J,ORF1ab,I2786V,0.54547 21K,ORF1ab,I2786V,0.070982 21L,ORF1ab,I2786V,-0.084611 22A,ORF1ab,I2786V,-1.2084 22B,ORF1ab,I2786V,0.2049 22C,ORF1ab,I2786V,0.091414 22D,ORF1ab,I2786V,1.635 22E,ORF1ab,I2786V,0.65771 22F,ORF1ab,I2786V,1.0659 23A,ORF1ab,I2786V,0.19675

In a possible example of sign epistatis (Nielsen et al, “Host heterogeneity and epistasis explain punctuated evolution of SARS-CoV-2”, PLOS Computational Biology, https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010896), the fitness effects of ORF1a:I2786V have swung between positive, neutral, and negative. For the parent clade 22F of XBB.2.3.8's clade, 23D, the fitness effect is notably positive at +1.07. The fitness effect in 23D's sibling clade 23A is mildly positive at +0.20. Unfortunately, there is not any Bloom and Neher data for clade 23D proper at this time.

For the Spike mutation S:T259I,

clade,gene,aa_mutation,delta_fitness 20A,S,T259I,-0.55946 20B,S,T259I,-3.1222 20C,S,T259I,-1.6007 20E,S,T259I,-1.2308 20G,S,T259I,-0.95054 20I,S,T259I,-0.75839 21C,S,T259I,-0.75147 21I,S,T259I,-0.55037 21J,S,T259I,-0.51226 21K,S,T259I,-0.51069 21L,S,T259I,-1.0859 22A,S,T259I,-0.6333 22B,S,T259I,-0.070116 22C,S,T259I,-0.31613 22D,S,T259I,-0.8223 22E,S,T259I,-0.095971 22F,S,T259I,0.24627 23A,S,T259I,0.23326

In another case of sign epistatus, S:T259I appears to have become mildly positive for viral fitness in the XBB clades 22F and 23A for which Bloom and Neher currently provide data.

Considering other possible Spike 259 mutations in the XBB clades 22F and 23A:

clade,gene,aa_mutation,delta_fitness 22F,S,T259A,-0.87999 22F,S,T259I,0.24627 22F,S,T259K,-0.61435 22F,S,T259P,-0.124 22F,S,T259R,-0.085376 22F,S,T259S,0.78287 22F,S,T259T,-1.0696 23A,S,T259A,0.44806 23A,S,T259I,0.23326 23A,S,T259K,0.2976 23A,S,T259P,0.6995 23A,S,T259R,0.82304 23A,S,T259S,0.77307 23A,S,T259T,-0.41188

Clade 23A appears to be permissive to mutations of Spike 259, but Clade 22F only demonstrates increased fitness for T259S and T259I. Of these two, T259I is reachable by a much faster C->T nucleotide mutation while T259S is only reachable by a much slower A->T or a very slow C->G nucleotide mutation.

For ORF3a:I35T,

clade,gene,aa_mutation,delta_fitness 20A,ORF3a,I35T,2.1887 20B,ORF3a,I35T,2.1594 20C,ORF3a,I35T,2.3603 20E,ORF3a,I35T,1.6369 20G,ORF3a,I35T,2.0516 20I,ORF3a,I35T,2.3055 21C,ORF3a,I35T,1.926 21I,ORF3a,I35T,2.2743 21J,ORF3a,I35T,2.3236 21K,ORF3a,I35T,2.5671 21L,ORF3a,I35T,2.639 22A,ORF3a,I35T,2.1479 22B,ORF3a,I35T,2.656 22C,ORF3a,I35T,1.9186 22D,ORF3a,I35T,2.3422 22E,ORF3a,I35T,2.1125 22F,ORF3a,I35T,2.0062 23A,ORF3a,I35T,2.4352

ORF3a:I35T significantly improves fitness in all clades for which Bloom and Neher provide estimates.

For ORF7a:T14I,

clade,gene,aa_mutation,delta_fitness 20A,ORF7a,T14I,1.05 20B,ORF7a,T14I,0.98869 20C,ORF7a,T14I,1.308 20E,ORF7a,T14I,1.2706 20G,ORF7a,T14I,1.1697 20I,ORF7a,T14I,1.2945 21C,ORF7a,T14I,1.0064 21I,ORF7a,T14I,0.9711 21J,ORF7a,T14I,1.188 21K,ORF7a,T14I,0.81577 21L,ORF7a,T14I,0.88621 22A,ORF7a,T14I,0.9391 22B,ORF7a,T14I,1.0228 22C,ORF7a,T14I,1.1932 22D,ORF7a,T14I,1.0648 22E,ORF7a,T14I,0.98223 22F,ORF7a,T14I,1.0486 23A,ORF7a,T14I,1.3319

ORF7a:T14I also improves fitness in all clades for which Bloom and Neher provide estimates.

From the Bloom and Neher estimates of the fitness effects of ORF1a:I2786V, S:T259I, ORF3a:I35T, and ORF7a:T14I, and assuming no negative non-linear epistatic effects between these mutations of different proteins, the lineage proposed here could reasonably demonstrate a growth advantage relative to the parent lineage XBB.2.3.8.

As of 2023-08-19, UShER shows all of the CoV-Spectrum samples are on a single subtree with evidence of additional branching: UShER_CoV-Spectrum_XBB 2 3 8+ORF1a_2786V+ORF3a_35T+ORF7a_14I+S_259I+8950T_1 To visualize on UShER: https://nextstrain.org/fetch/github.com/alurqu/pango-designation-support-alurqu/raw/main/2023/08/subtreeAuspice1_genome_CoV-Spectrum_XBB.2.3.8%2BORF1a_2786V%2BORF3a_35T%2BORF7a_14T%2BS_259I%2B8950T.json?c=gt-ORF7a_14&label=id%3Anode_7340406

Zoomed-out: UShER_CoV-Spectrum_XBB 2 3 8+ORF1a_2786V+ORF3a_35T+ORF7a_14I+S_259I+8950T_2 To visualize on UShER: https://nextstrain.org/fetch/github.com/alurqu/pango-designation-support-alurqu/raw/main/2023/08/subtreeAuspice1_genome_CoV-Spectrum_XBB.2.3.8%2BORF1a_2786V%2BORF3a_35T%2BORF7a_14T%2BS_259I%2B8950T.json?c=gt-ORF7a_14&label=id%3Anode_7339612

GISAID query: A8621G, T25496C, C27434T, C22338T, C8950T

First GISAID sequence: Nevada, USA 2023-07-06

Most Recent GISAID sequence: Washington State, USA 2023-08-07

A zip archive of GenBank-formatted and derived metadata and FASTA files plus CoV-Spectrum-derived UShER output files for these sequences is available at Support-XBB.2.3.8+ORF1a_2786V+ORF3a_35T+ORF7a_14T+S_259I+8950T.zip

A CoV-Spectrum list of GISAID EPI ISLs for good-quality sequences is available at gisaid-epi-isl+XBB.2.3.8+ORF1a_2786V+ORF3a_35T+ORF7a_14I+S_259I+8950T.txt.

Edit: Correct spacing and the CoV-Spectrum lineage search URL.

FedeGueli commented 1 year ago

Designated HG.2 via https://github.com/cov-lineages/pango-designation/commit/238f48bbf07cb8a562bc7d05b7b6730360f4e778