Open ryhisner opened 2 weeks ago
Thx @ryhisner that was tracked till 456L as
Branch 99 FLiRT JN.1 >S:S71F (C21774T) > S:r346T ( G22599C )> S:F456L (T22928C) Query: C21774T ,G22599C ,T22928C,T3565C,-C7113T,-C24863T,-G4067A,-C28674T Samples: 16 (France, England)
in #1089
good to be proposed now , linking this proposal to it there.
Through F456L, this lineage is so incredibly slow that I don't really think it's worth proposing. It first appeared in mid March in a well-sequenced region of the world (France), yet it took three months to hit 20 sequences. It will probably still be slow with K182N and K444R even unless it gets ∆S31 or something.
EDIT: Actually, I don't think this branch has even reached 20 yet. Seems to only be 17.
One other thing about this lineage that doesn't appear on Usher: it has ORF1a:L3606F. Using the query:
• C21774T, G22599C, T22928C, c897a, -A21865T, -A3941G, -G17334T, -C25350T, -C24863T
I get 18 sequences. Two have dropout at L3606, but one doesn't have L3606F or dropout. It also has several other nucleotide differences not found in any of the others, one of which is S:E1150D. So a better query for the F456L branch would be
• C21774T, G22599C, T22928C, C897A, -A21865T, -A3941G, -G17334T, -C25350T, -C24863T, -G25012T
But again, that one is so slow I don't think it's worth proposing in any case.
This is what the tree should really look like: https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons2/main/JN.1_S71F_R346T_F456L_K182N_K444R_L3606F.json?c=gt-S_182&gmax=25384&gmin=21563&label=id:node_6950022
I believe the Orf1a:S2103F sub-branch is actually the real parent of LF.1 @AngieHinrichs @corneliusroemer (then it gets S:R346T to form LF.1)
So correct path of LF.1 shall be JN.1.16->Orf1a:S2103F->S:R346T->Orf1a:A1268T
I get 18 sequences. Two have dropout at L3606, but one doesn't have L3606F or dropout. It also has several other nucleotide differences not found in any of the others, one of which is S:E1150D. So a better query for the F456L branch would be
• C21774T, G22599C, T22928C, C897A, -A21865T, -A3941G, -G17334T, -C25350T, -C24863T, -G25012T
But again, that one is so slow I don't think it's worth proposing in any case.
L3606F in fact is always shown by covspectrum (at least until earlier in June) but never in any known lineage wiondering if it is just there and we don't see it for USher masking
Downsampled tree for samples with it in the last month:
Yes, ORF1ab:3606 (nucleotide 11083) is part of the Problematic Sites set that is masked in all sequences before they are added to the tree, and in fact it was one of the first sites to be recognized as highly homoplasic in SARS-CoV-2 and therefore problematic for building phylogenetic trees. 11083 is the weird site that got me connected to Russ and Yatish back in April 2020! 🙂
Yes, ORF1ab:3606 (nucleotide 11083) is part of the Problematic Sites set that is masked in all sequences before they are added to the tree, and in fact it was one of the first sites to be recognized as highly homoplasic in SARS-CoV-2 and therefore problematic for building phylogenetic trees. 11083 is the weird site that got me connected to Russ and Yatish back in April 2020! 🙂
Blessed one then!
I believe the Orf1a:S2103F sub-branch is actually the real parent of LF.1 @AngieHinrichs @corneliusroemer (then it gets S:R346T to form LF.1)
So correct path of LF.1 shall be JN.1.16->Orf1a:S2103F->S:R346T->Orf1a:A1268T
Not sure it lacks G4067A that is in all LF.3 ,
Btw this small branch has one sample with S:P486L
Description Sub-lineage of: JN.1 Earliest sequence: 2024-5-23, Scotland – EPI_ISL_19187869 Most recent sequence: 2024-6-7, Scotland – EPI_ISL_19213871, EPI_ISL_19213872, EPI_ISL_19213873 Continents circulating: Europe (4) Countries circulating: Scotland (4) Number of Sequences: 4 GISAID Nucleotide Query: T21633C, A22108C, A22893G, -A23013G, -C29666T CovSpectrum Query: Nextcladepangolineage:JN.1* & [4-of: C21774T, G22599C, T22928C, A22108C, A22893G] Substitutions on top of JN.1: Spike: S71F, K182N, R346T, K444R, F456L ORF1a: L3606F (NSP6_L37F) Nucleotide: G11083T, C21774T, A22108C, G22599C, A22893G, T22928C
Phylogenetic Order of Mutations: C21774T (S:S71F), G11083T (ORF1a:L3606F) → G22599C (S:R346T) → T22928C (S:F456L) → A22108C, A22893G (S:K182N, S:K444R)
USHER Tree https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons2/main/JN.1_S71F_R346T_F456L_K182N_K444R_L3606F.json?c=gt-S_182&gmax=25384&gmin=21563&label=id:node_6950022
Evidence According to Usher, this branch comes directly from the JN.1 polytomy and acquired five consecutive non-synonymous nucleotide mutations in S1, all in antigenically important sites (3 in RBD, 2 in NTD loops). I've found Usher to be much less reliable of late, primarily because it doesn't include nearly as many closely related sequences as it used to, so it's possible this is an incorrect tree assignment or else a very incomplete one.
Three of the four sequences also have C6388T, and the fourth sequences has dropout in this location, so it's very likely that all four also have C6388T, which is synonymous. C6388T has only appeared in six other sequences in the past two months, in six different lineages, with none collected in the UK.
One of the three most recent sequences also has ORF1b:T1050I, which has been a significant site in the past.
Genomes
Genomes
EPI_ISL_19187869, EPI_ISL_19213871, EPI_ISL_19213872, EPI_ISL_19213873