sars-cov-2-variants / lineage-proposals

Repository to propose and discuss lineages
42 stars 1 forks source link

JN.1 + S:S71F, K182N, R346T, K444R, F456L (4 seq, Scotland, Jun 24) #1648

Open ryhisner opened 2 weeks ago

ryhisner commented 2 weeks ago

Description Sub-lineage of: JN.1 Earliest sequence: 2024-5-23, Scotland – EPI_ISL_19187869 Most recent sequence: 2024-6-7, Scotland – EPI_ISL_19213871, EPI_ISL_19213872, EPI_ISL_19213873 Continents circulating: Europe (4) Countries circulating: Scotland (4) Number of Sequences: 4 GISAID Nucleotide Query: T21633C, A22108C, A22893G, -A23013G, -C29666T CovSpectrum Query: Nextcladepangolineage:JN.1* & [4-of: C21774T, G22599C, T22928C, A22108C, A22893G] Substitutions on top of JN.1: Spike: S71F, K182N, R346T, K444R, F456L ORF1a: L3606F (NSP6_L37F) Nucleotide: G11083T, C21774T, A22108C, G22599C, A22893G, T22928C

Phylogenetic Order of Mutations: C21774T (S:S71F), G11083T (ORF1a:L3606F) G22599C (S:R346T) T22928C (S:F456L) A22108C, A22893G (S:K182N, S:K444R)

USHER Tree https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons2/main/JN.1_S71F_R346T_F456L_K182N_K444R_L3606F.json?c=gt-S_182&gmax=25384&gmin=21563&label=id:node_6950022

image

Evidence According to Usher, this branch comes directly from the JN.1 polytomy and acquired five consecutive non-synonymous nucleotide mutations in S1, all in antigenically important sites (3 in RBD, 2 in NTD loops). I've found Usher to be much less reliable of late, primarily because it doesn't include nearly as many closely related sequences as it used to, so it's possible this is an incorrect tree assignment or else a very incomplete one.

Three of the four sequences also have C6388T, and the fourth sequences has dropout in this location, so it's very likely that all four also have C6388T, which is synonymous. C6388T has only appeared in six other sequences in the past two months, in six different lineages, with none collected in the UK.

One of the three most recent sequences also has ORF1b:T1050I, which has been a significant site in the past.

Genomes

Genomes EPI_ISL_19187869, EPI_ISL_19213871, EPI_ISL_19213872, EPI_ISL_19213873
FedeGueli commented 2 weeks ago

Thx @ryhisner that was tracked till 456L as

Branch 99 FLiRT JN.1 >S:S71F (C21774T) > S:r346T ( G22599C )> S:F456L (T22928C) Query: C21774T ,G22599C ,T22928C,T3565C,-C7113T,-C24863T,-G4067A,-C28674T Samples: 16 (France, England)

in #1089

good to be proposed now , linking this proposal to it there.

ryhisner commented 1 week ago

Through F456L, this lineage is so incredibly slow that I don't really think it's worth proposing. It first appeared in mid March in a well-sequenced region of the world (France), yet it took three months to hit 20 sequences. It will probably still be slow with K182N and K444R even unless it gets ∆S31 or something.

EDIT: Actually, I don't think this branch has even reached 20 yet. Seems to only be 17.

One other thing about this lineage that doesn't appear on Usher: it has ORF1a:L3606F. Using the query:

• C21774T, G22599C, T22928C, c897a, -A21865T, -A3941G, -G17334T, -C25350T, -C24863T

I get 18 sequences. Two have dropout at L3606, but one doesn't have L3606F or dropout. It also has several other nucleotide differences not found in any of the others, one of which is S:E1150D. So a better query for the F456L branch would be

• C21774T, G22599C, T22928C, C897A, -A21865T, -A3941G, -G17334T, -C25350T, -C24863T, -G25012T

But again, that one is so slow I don't think it's worth proposing in any case.

ryhisner commented 1 week ago

This is what the tree should really look like: https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons2/main/JN.1_S71F_R346T_F456L_K182N_K444R_L3606F.json?c=gt-S_182&gmax=25384&gmin=21563&label=id:node_6950022

image
aviczhl2 commented 1 week ago

I believe the Orf1a:S2103F sub-branch is actually the real parent of LF.1 @AngieHinrichs @corneliusroemer (then it gets S:R346T to form LF.1)

So correct path of LF.1 shall be JN.1.16->Orf1a:S2103F->S:R346T->Orf1a:A1268T

FedeGueli commented 1 week ago

I get 18 sequences. Two have dropout at L3606, but one doesn't have L3606F or dropout. It also has several other nucleotide differences not found in any of the others, one of which is S:E1150D. So a better query for the F456L branch would be

• C21774T, G22599C, T22928C, C897A, -A21865T, -A3941G, -G17334T, -C25350T, -C24863T, -G25012T

But again, that one is so slow I don't think it's worth proposing in any case.

L3606F in fact is always shown by covspectrum (at least until earlier in June) but never in any known lineage wiondering if it is just there and we don't see it for USher masking

Screenshot 2024-06-25 alle 11 35 24

Downsampled tree for samples with it in the last month:

Screenshot 2024-06-25 alle 11 40 03

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/singleSubtreeAuspice_genome_43072_a8f8c0.json?f_userOrOld=uploaded%20sample&label=id:node_6936864

AngieHinrichs commented 1 week ago

Yes, ORF1ab:3606 (nucleotide 11083) is part of the Problematic Sites set that is masked in all sequences before they are added to the tree, and in fact it was one of the first sites to be recognized as highly homoplasic in SARS-CoV-2 and therefore problematic for building phylogenetic trees. 11083 is the weird site that got me connected to Russ and Yatish back in April 2020! 🙂

FedeGueli commented 1 week ago

Yes, ORF1ab:3606 (nucleotide 11083) is part of the Problematic Sites set that is masked in all sequences before they are added to the tree, and in fact it was one of the first sites to be recognized as highly homoplasic in SARS-CoV-2 and therefore problematic for building phylogenetic trees. 11083 is the weird site that got me connected to Russ and Yatish back in April 2020! 🙂

Blessed one then!

FedeGueli commented 1 week ago

I believe the Orf1a:S2103F sub-branch is actually the real parent of LF.1 @AngieHinrichs @corneliusroemer (then it gets S:R346T to form LF.1)

So correct path of LF.1 shall be JN.1.16->Orf1a:S2103F->S:R346T->Orf1a:A1268T

Not sure it lacks G4067A that is in all LF.3 ,

Btw this small branch has one sample with S:P486L