sars-cov-2-variants / lineage-proposals

Repository to propose and discuss lineages
42 stars 2 forks source link

EG.5.1.3 sublineage with ORF1a:V274L, C27612T, and ORF1a:E2268V first detected in Massachusetts, USA (73 GISAID seqs as of 2023-09-28; Canada, USA, Denmark, France, UK) #859

Closed alurqu closed 10 months ago

alurqu commented 11 months ago

There may be a EG.5.1.3 sublineage with ORF1a:V274L (G1085T; NSP2:V94L), C27612T, and ORF1a:E2268V (A7068T; NSP3:E1450V) first detected in Massachusetts, USA.

As of 2023-09-23, Cov-Spectrum reports 57 good-quality (57 total) EG.5.1.3+ORF1a:2268V+ORF1a:274L+27612T sequences. Source: https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?variantQuery=nextcladePangoLineage%3AEG.5.1.3+%26+ORF1a%3AE2268V+%26+ORF1a%3AV274L+%26+C27612T&nextcladeQcOverallScoreTo=29&

Most of samples are from in Canada, although at 47 the counts are a little low for a growth estimate, but the growth rate estimate for this lineage in Canada (using the past 6 months) is high. This lineage has also been detected in 5 countries on 2 continents.

As of 2023-09-23, UShER shows all of the CoV-Spectrum samples are on a single subtree with evidence of additional branching: UShER_CoV-Spectrum_EG 5 1 3+ORF1a_2268V+ORF1a_274L+27612T To visualize on UShER: https://nextstrain.org/fetch/github.com/alurqu/pango-designation-support-alurqu/raw/main/2023/09/subtreeAuspice1_genome_CoV-Spectrum_EG.5.1.3%2BORF1a_2268V%2BORF1a_274L%2B27612T.json?c=gt-nuc_7068&label=id%3Anode_6976134

Unfortunately the visualization chose similar colors blue and green which may be hard for some to see, so this is the branch at the top of the diagram.

There may be additional levels for which designation should be considered. In particular, as shown above the intermediate lineage EG.5.1.3+ORF1a:274L is relatively slow but large and already has several branches that may give rise to future designations.

Child lineages EG.5.1.3+ORF1a:V274L+ORF1a:E2268V+ORF1a:G519S and EG.5.1.3+ORF1a:V274L+ORF1a:E2268V+ORF1a:G519S+ORF1a:T1344I may also merit designation in the future.

For the main lineage proposed:

GISAID query: C1889T, G1085T, C27612T, A7068T

First GISAID sequence: Massachusetts, USA 2023-07-18

Most Recent GISAID sequence: Ontario, Canada 2023-09-13

A zip archive of GenBank-formatted and derived metadata and FASTA files plus CoV-Spectrum-derived UShER output files for these sequences is available at Support-EG.5.1.3+ORF1a_2268V+ORF1a_274L+27612T.zip

A CoV-Spectrum list of GISAID EPI ISLs for good-quality sequences is available at gisaid-epi-isl-EG.5.1.3+ORF1a_2268V+ORF1a_274L+27612T.txt

This lineage is related to but different from the singlet mentioned in https://github.com/sars-cov-2-variants/lineage-proposals/issues/852.

Potential effects of the non-synonymous mutations on viral relative fitness

Now to consider the clade-specific Bloom and Neher estimates (from https://github.com/jbloomlab/SARS2-mut-fitness/blob/main/results/aa_fitness/aamut_fitness_by_clade.csv) of the fitness effects of the mutations related to this lineage proposal:

For ORF1a:V274L (NSP2:V94L),

clade,gene,aa_mutation,delta_fitness 20A,ORF1ab,V274L,-0.86818 20B,ORF1ab,V274L,-0.36857 20C,ORF1ab,V274L,-0.53806 20E,ORF1ab,V274L,0.90467 20G,ORF1ab,V274L,0.38506 20I,ORF1ab,V274L,0.03853 21C,ORF1ab,V274L,0.43111 21I,ORF1ab,V274L,0.20052 21J,ORF1ab,V274L,0.43032 21K,ORF1ab,V274L,0.80154 21L,ORF1ab,V274L,0.78801 22A,ORF1ab,V274L,-0.29353 22B,ORF1ab,V274L,0.67152 22C,ORF1ab,V274L,0.87867 22D,ORF1ab,V274L,1.5172 22E,ORF1ab,V274L,0.66175 22F,ORF1ab,V274L,0.86609 23A,ORF1ab,V274L,1.0052

ORF1a:V274L (NSP2:V94L) shows a mix of negative and positive relative fitness effects, but most recent lineages have shown positive fitness effects from this lineage.

For ORF1a:E2268V (NSP3:E1450V),

clade,gene,aa_mutation,delta_fitness 20A,ORF1ab,E2268V,0.98486 20B,ORF1ab,E2268V,-0.53231 20C,ORF1ab,E2268V,-0.36201 20E,ORF1ab,E2268V,-0.40162 20G,ORF1ab,E2268V,-0.56969 20I,ORF1ab,E2268V,-1.3253 21C,ORF1ab,E2268V,-0.28459 21I,ORF1ab,E2268V,-0.82507 21J,ORF1ab,E2268V,-1.1453 21K,ORF1ab,E2268V,-1.9529 21L,ORF1ab,E2268V,-0.52784 22A,ORF1ab,E2268V,-0.44718 22B,ORF1ab,E2268V,-1.5901 22C,ORF1ab,E2268V,0.48568 22D,ORF1ab,E2268V,-0.33269 22E,ORF1ab,E2268V,-0.85197 22F,ORF1ab,E2268V,0.78287 23A,ORF1ab,E2268V,-0.83637

ORF1a:E2268V (NSP3:E1450V) is often associated with significant negative impacts on relative viral fitness. However, in some recent cases it has shown a positive impact on relative viral fitness. This mutation may be a candidate for sign epistatis.

For mutations related to the possible child lineages,

For ORF1a:G519S (NSP2:G339S),

clade,gene,aa_mutation,delta_fitness 20A,ORF1ab,G519S,2.852 20B,ORF1ab,G519S,3.2413 20C,ORF1ab,G519S,3.0256 20E,ORF1ab,G519S,3.0104 20G,ORF1ab,G519S,2.9107 20I,ORF1ab,G519S,3.255 21C,ORF1ab,G519S,2.7452 21I,ORF1ab,G519S,3.7204 21J,ORF1ab,G519S,3.8733 21K,ORF1ab,G519S,2.8843 21L,ORF1ab,G519S,2.9214 22A,ORF1ab,G519S,2.962 22B,ORF1ab,G519S,2.9191 22C,ORF1ab,G519S,2.7848 22D,ORF1ab,G519S,3.4264 22E,ORF1ab,G519S,2.9991 22F,ORF1ab,G519S,3.3792 23A,ORF1ab,G519S,3.4484

ORF1a:G519S (NSP2:G339S) demonstrates a consistent strong positive impact on viral fitness in association with all clades. This could even be the appropriate mutation at which to designate the lineage.

For ORF1a:T1344I (NSP3:T526I),

clade,gene,aa_mutation,delta_fitness 20A,ORF1ab,T1344I,-1.3216 20B,ORF1ab,T1344I,-0.28898 20C,ORF1ab,T1344I,-0.50207 20E,ORF1ab,T1344I,-1.7416 20G,ORF1ab,T1344I,-0.31455 20I,ORF1ab,T1344I,-0.47654 21C,ORF1ab,T1344I,-0.415 21I,ORF1ab,T1344I,-1.0969 21J,ORF1ab,T1344I,-0.62613 21K,ORF1ab,T1344I,-0.39628 21L,ORF1ab,T1344I,-1.9027 22A,ORF1ab,T1344I,-1.0853 22B,ORF1ab,T1344I,-1.0711 22C,ORF1ab,T1344I,-0.6526 22D,ORF1ab,T1344I,-0.8223 22E,ORF1ab,T1344I,-1.0275 22F,ORF1ab,T1344I,-2.3187 23A,ORF1ab,T1344I,-1.0824

Despite its apparent association with a large cluster, ORF1a:T1344I (NSP3:T526I) shows a negative relative fitness impact from all clades. For this to increase viral fitness, sign epistatis would have to be in play. With this in mind, the negative relative fitness has varied significantly across clades.

Edit: Corrected "ORF1a:G519S (NSP3:G339S) demonstrates" to "ORF1a:G519S (NSP2:G339S) demonstrates". ORF1a:G519 is part of NSP2 not NSP3.

alurqu commented 11 months ago

Looking deeper, in the big set of sequences from Ontario of those with age metadata many indicate age 75+. This could be a care home outbreak, and the sequences from younger patients could be from staff or from younger residents.

FedeGueli commented 11 months ago

Looking deeper, in the big set of sequences from Ontario of those with age metadata many indicate age 75+. This could be a care home outbreak, and the sequences from younger patients could be from staff or from younger residents.

very likely!

alex-m-a commented 10 months ago

Thanks all, confirming the Ontario cases are from a known cluster in a healthcare facility.

FedeGueli commented 10 months ago

Thanks all, confirming the Ontario cases are from a known cluster in a healthcare facility.

thank you very much! closing this !