sars-cov-2-variants / lineage-proposals

Repository to propose and discuss lineages
43 stars 2 forks source link

BA.2 Saltation, 20+ spike mutations (3 seq - one retracted, apparently 1 chronic patient or maybe 2 different patients) #1965

Open ryhisner opened 2 weeks ago

ryhisner commented 2 weeks ago

Description Sub-lineage of: BA.2 Earliest sequence: 2024-7-14, Spain – EPI_ISL_19322462 Most recent sequence: 2024-7-16, Spain – EPI_ISL_19370418 Continents circulating: Europe (2) Countries circulating: Spain (2) Number of Sequences: 2 GISAID Nucleotide Query: G278A, T11075A, -C46G CovSpectrum Query: Nextcladepangolineage:BA.2* & [10-of: G278A, C1616T, A3406G, A5391G, C6286T, C7081T, T11075A, C11572T, G12028A, C14120T, T16389C, T17543C, A18273G, C18326T, A20355G] Substitutions on top of BA.2: Spike: P9L, Q14H, ∆Y144, S151C, F157L, R190K, H245N, R346T, N354K, L368I, K444R, K478E, A484E (reversion), F490S, T547K, A688V, T747I, T883I, D1118H, E1202Q, P1263L N: A90S M: D3Y ORF8: F120V, I121L (N/ORF9b TRS extended homology) ORF9b: S50L, E86D ORF1a: V5I, L451F, N1709S, F3604I, L3606F, M3921I ORF1b: P218L, M1359T, A1620V

Nucleotide: G278A, C1616T, A3406G, A5391G, C6286T, C7081T, T11075A, G11083T, C11572T, G12028A, C14120T, T16389C, T17543C, A18273G, C18326T, A20355G, C21588T, C21595T (artifact?), G21604T, A22013T, C22033A, G22131A, C22264T, C22295A, G22599C, C22624A, C22664A, A22893G, A22994G, T23031C, T23134C, C23202A, C23625T, C23802T, T24487C, G24914C, C26222T, G26529T, C27059T, T28251G, A28254C, C28432T, G28541T, A29125G, C29642T

USHER Tree https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons3/main/BA.2_saltation_Spain_July_2024.json?label=id:node_11626229

image

Evidence The metadata indicate that these two BA.2 sequences came from different patients. Both were collected in mid-July, 2024, in the same region of Spain. The sequence quality for both is not great, with some dropout and frameshifts, so there are likely additional deletions and substitutions not visible.

Several of the spike mutations I've listed above do not show up in the first sequence collected due to dropout and/or frameshifts, but I am assuming they are present in both sequences.

I might add some more analysis later, but for just a few short notes.

• S:P9L and S:S151C, this BA.2 branch has replaced the C15-C136 spike NTD disulfide bond with a C136-C151 disulfide, likely causing a substantial rearrangement of the NTD. • T11075A (ORF1a:F3604I) is extraordinarily rare. It has only ever appeared in 7-8 previous sequences. This branch also has the highly homoplasic ORF1a:L3606F, though that site is masked by Usher. • A few notable mutations seen in previous major variants: —S:H245N (creates glycan) - BA.2.3.20, BA.2.86 —S:D1118H - Alpha (B.1.1.7) —M:D3Y - BJ.1 (which later recombined with a BA.2.75 descendant to form XBB)

Genomes

Genomes EPI_ISL_19322462, EPI_ISL_19370418
FedeGueli commented 2 weeks ago

First few analysis of mutations: S:Q14H was very good in FLiP lineages especially in China but here the S:P9L could change everything so not sure about its effect S:L368I is a comeback from XBB S:547-1263 are all know mutations we saw here and there and for sure not detrimental at least: 547K is doing very well now in KP.3.3.1 688V is around since bQ.1 in multiple designated lineage the last ones are JN.1.15/LU.1/LU.2 and descendants S:T747I was in BA.2.51 and XBB.1.44.1 S:T883I was popular since BQ.1 CH.1 and XBB.1 era S:E1202Q was popular in the VOCs era while S:E1202K is very widespread in combination with a silent nuc at 1201. S:D1118H was in all Alpha S:P1263L is circulating in LF.1.1.1 and XED M:D3Y was in BJ.1 that then recombined to give birth to XBB Orf1b:P218L was in FY.1.2 Orf9b:E86D/N:A90S was in BF.7.4.3 that shares also S:F157L with this (see https://github.com/cov-lineages/pango-designation/issues/1743)

FedeGueli commented 2 weeks ago

Sorry i just saw Ryan already updated while i was writing the comment.

FedeGueli commented 2 weeks ago

Alternative queries to check for it : T24487C, C26222T, G26529T or G278A, A3406G, A5391G to target different parts of the genome and catch eventually recombs.

FedeGueli commented 2 weeks ago

ping @corneliusroemer cc @silcn @oobb45729 @thomasppeacock @angiehinrichs @shay671

corneliusroemer commented 2 weeks ago

Thanks for the ping, interesting stuff - but of course not ready for designation given just 2 sequences from same region (i.e. could be just 2 people from same household, one of which is chronic themselves). Stuff I'd check next: where does it actually descend from (placement on Usher is in some reversion island = artefactual) and could it be recombinant of something?

The 20 spike substitutions are quite few compared to current circulation which is >30 compared to BA.2

But at the very least there seems to be some spread which is a rare thing to observe in long branch variants. If anyone knows the submitters, they probably know but if not might be worth pinging so they are aware of the significance of what they found:

Authors: Antonio Orduña-Domingo, Marta Hernandez, Marta Dominguez-Gil, Silvia Rojo, Gabriel March Rosello, Sonsoles Garcinuño Pérez, Carmen Aldea-Mansilla, Mª Fe Brezmes-Valdivieso, Gregoria Megías Lobón, María Antonia García Castro, Carmen Gimeno Crespo, Noelia Arenal Andrés, Carlos Fuster Foz, M. Isabel Fernandez-Natal, Jose María Eiros Bouza Submitter: Hernandez, Marta

ryhisner commented 2 weeks ago

Sorry i just saw Ryan already updated while i was writing the comment.

No worries, your comments are great, Fede!

FedeGueli commented 2 weeks ago

Sorry i just saw Ryan already updated while i was writing the comment.

No worries, your comments are great, Fede!

thanks Ryan! one thing i missed to mention: the 478E plus 484E (rev): 478E is quite trending lately in multiple lineages without giving big advantages tough, interestingly it was one of the first spike mutations picked up by BA.2.86 in South Africa very early in its spread: they didnt anything but maybe that was the first hint (before 493E) that RBD could like a negative charge in the RBM, and note that the second Ba.2.86 RBD mutation in South Africa was S:N487D ( that is likely still around there), another negatively charged AA. This BA.2 takes two negatively charged AA if the reversion is real.

ryhisner commented 2 weeks ago

One early study on Alpha noted that D1118H, which is near the base of the spike, could could interact with D1118H from the other spike protomers and may cause a local rearrangement.

image

Gobeil SM, Janowska K, McDowell S, et al. Effect of natural mutations of SARS-CoV-2 on spike structure, conformation, and antigenicity. Science. 2021;373(6555):eabi6226. doi:10.1126/science.abi6226

corneliusroemer commented 2 weeks ago

I reached out to the submitter via Twitter so that they know about the discussion here.

Just in cases one of the submitters comes across this here - all the people here are extremely grateful to your continued work sequencing and rapidly sharing the data.

ryhisner commented 2 weeks ago

Sorry i just saw Ryan already updated while i was writing the comment.

No worries, your comments are great, Fede!

thanks Ryan! one thing i missed to mention: the 478E plus 484E (rev): 478E is quite trending lately in multiple lineages without giving big advantages tough, interestingly it was one of the first spike mutations picked up by BA.2.86 in South Africa very early in its spread: they didnt anything but maybe that was the first hint (before 493E) that RBD could like a negative charge in the RBM, and note that the second Ba.2.86 RBD mutation in South Africa was S:N487D ( that is likely still around there), another negatively charged AA. This BA.2 takes two negatively charged AA if the reversion is real.

The combination of S:S477N and S:K478E creates a TRS motif. I suspect it's deleterious. There are no open reading frames (the first start codon goes ~26 codons before hitting a stop), and anything it would produce is probably junk in any case. It could increase the proportion of subgenomic RNA relative to genomic RNA, though at the cost of being very inefficient.

image
corneliusroemer commented 2 weeks ago

Importantly, the assemblies are quite sketchy - possibly due to low coverage/low viral load, or also maybe dropout due to the many mutations.

I wouldn't be confident about the number of spike substitutions now based on what we see here:

image

Nextclade places it on the BA.2 branch with C9866T (i.e. not the BA.2 that's predominantly Southern African and also BA.2.86 ancestor) and also extra C25416T, the parent of BA.2.12.2 and BA.2.75 and more.

That branch was at its peak only 5% in Spain, as opposed to 50% in India or 30% in Australia. So that new information makes it somewhat less likely that this is a Spanish chronic infection and more likely it could be an import (of course the prior was not low that this is a local Spanish chronic infection, so overall that's still possible)

https://cov-spectrum.org/explore/World/AllSamples/from%3D2022-01-06%26to%3D2022-08-24/variants?variantQuery=nextcladePangoLineage%3ABA.2*+%26+C25416T+%26+C9866T+%26+%21nextcladePangoLineage%3ABA.2.75*+%26+%21nextcladePangoLineage%3ABA.2.12.1*+%26+%21nextcladePangoLineage%3ABA.2.38*&

Brave Browser 2024-08-31 00 45 26

Raw reads would be amazing of course to see what's going on. (Also one can hand fix the frameshift to see the spike mutations with Nextclade in the first sequence)

Hand-fixed alignment:

image
ryhisner commented 2 weeks ago

Importantly, the assemblies are quite sketchy - possibly due to low coverage/low viral load, or also maybe dropout due to the many mutations.

I wouldn't be confident about the number of spike substitutions now based on what we see here: image

Nextclade places it on the BA.2 branch with C9866T (i.e. not the BA.2 that's predominantly Southern African and also BA.2.86 ancestor) and also extra C25416T, the parent of BA.2.12.2 and BA.2.75 and more.

All the sequences from the upload by this lab look like this, so I don't think it can tell us much about the viral load. But definitely agree about the uncertainty in the number of spike mutations. If there is another sequence, hopefully it will be a little more clear.

image
FedeGueli commented 2 weeks ago

and also extra C25416T, the parent of BA.2.12.2 and BA.2.75 and more.

oh that one! many of the fastest BA.2 descendants, saltations came from that branch! and also XBB had it via Ba.2.75

corneliusroemer commented 2 weeks ago

I think I've found the likely ancestor branch, evidence is strong because it's 2-fold:

I looked for the BA.2 lineages with the biggest mutational overlap with this cluster using covSpectrum (excluding Spike).

Clear hit in Spain, BA.2 sublineage sampled in the same Spanish state in April/May 2022.

Brave Browser 2024-08-31 04 08 13 Brave Browser 2024-08-31 04 08 26

So we're looking at a local 2y chronic infection (maybe more than one chronic infectee involved) and now it got sequenced in 2 people. Possibly one of the 2 sequences is from one of the chronic infectees, that leaves one more who got it somehow and got sequenced. Hard to interpret unless we know how samples were selected.

Fact that it got sequenced where it came form means it's unlikely to have a big population size somewhere unsampled - meaning less likely to become relevant outside of the local region.

@ryhisner @FedeGueli do you know of other recent cases where a saltation/chronic infection was sequenced in >1 person?

GISAID query for the 13 sequences (including the 2 new ones): C14120T,A18273G,9866T

Here's a rough sketch of states (just one location used with scatter per known city/state) where the samples were collected. This is just for illustration of rough geographic distribution, the actual samples might well be quite differently distributed. 2 green + are the new ones.

image
cvejris commented 2 weeks ago

I´m still sceptical about the potential of this variant to initiate a new wave, since it lacks the affinity-enhancing mutations in RBD (N460K, R493rev), but let´s wait and see...

ryhisner commented 2 weeks ago

@ryhisner @FedeGueli do you know of other recent cases where a saltation/chronic infection was sequenced in >1 person?

Yes, it's rare, but there have been dozens of times where a chronic-infection virus transmits to one or two people but then ends there. To give one of the more remarkable examples, there's this BA.1 branch from Brazil in which most sequences are from the same individual but two were from different patients. Potentially, this one could be ongoing I suppose, but the most recent sequence was collected in April. I wrote a thread about this one back in January, before the 3rd patient's sequence was uploaded in April. https://x.com/LongDesertTrain/status/1751670652078072308

image

ryhisner commented 2 weeks ago

I´m still sceptical about the potential of this variant to initiate a new wave, since it lacks the affinity-enhancing mutations in RBD (N460K, R493rev), but let´s wait and see...

Yeah, it's extremely rare to see a chronic virus with Q493R transmit. I think it's clear now that Q493R is one of the worst spike mutations we've ever seen circulate. For me, the whole reason BJ.1 didn't grow rapidly but then took off upon recombination (turning into XBB) is that it lost Q493R—and as little else as possible—in the recomb. That was all it needed. You'd think R493Q would have happened more often in circulating lineages, but as far as I know, it's only ever occurred in chronics. It is, in fact, still the single most common substitution in chronic-infection sequences.

corneliusroemer commented 2 weeks ago

We won't designate as is of course but I'm considering making a sizeable ancestral parent branch with say 500 sequences a BA.2 sublineage to make it easier to a) talk about it, and b) for people to quickly spot related sequences of this cluster if they do lineage assignment in pangolin/Usher/Nextclade

cvejris commented 2 weeks ago

I´m still sceptical about the potential of this variant to initiate a new wave, since it lacks the affinity-enhancing mutations in RBD (N460K, R493rev), but let´s wait and see...

Yeah, it's extremely rare to see a chronic virus with Q493R transmit. I think it's clear now that Q493R is one of the worst spike mutations we've ever seen circulate. For me, the whole reason BJ.1 didn't grow rapidly but then took off upon recombination (turning into XBB) is that it lost Q493R—and as little else as possible—in the recomb. That was all it needed. You'd think R493Q would have happened more often in circulating lineages, but as far as I know, it's only ever occurred in chronics. It is, in fact, still the single most common substitution in chronic-infection sequences.

Very good point with BJ.1! Virtually all the most mutated BA.2-based saltation variants from the soup 2 years ago (how I miss these times) had 493rev. BA.2.10.4, BA.2.75, BA.2.77, BA.2.83, BA.2.3.20, BS.1, BA.4/5 (there were definitely more). N460K followed in frequency among these.

alurqu commented 2 weeks ago

One of these samples, EPI_ISL_19320418, seems to have disappeared from GISAID at least at the present time.

FedeGueli commented 2 weeks ago

Thx @corneliusroemer, i don't remember any chronic transmitting beyond the one highlighted by Ryan, there were some suspected chain of transmission in Campania, Italy but once we contacted Grimaldi of Tigem lab, he verified that there were just input errors.

Sinickle commented 1 week ago

New sequence EPI_ISL_19374843 - Collected July 15 Patient metadata and location matches EPI_ISL_19322462, which is the one that was NOT retracted.

It seems to me that this new sequence is the exact same genome as the one that was retracted, which leads me to think the metadata on it previously was a mistake that has now been corrected, and this proposed variant has only ever been seen in one patient. https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_1540b_7145b0.json?f_userOrOld=uploaded%20sample

corneliusroemer commented 1 week ago

Thanks @Sinickle - good that it's been fixed. We all know mistakes happen. But it would be good if GISAID didn't just silently change data without a record.