Closed HynnSpylor closed 11 months ago
Thx @HynnSpylor interesting how it got 403K back very soon in the 344V branch. Is there any known interaction between 344 and 403? @Sinickle @oobb45729
Yeah this is a weird branch that I've also noticed. I wanted to designate S:A344T+S:403K but usher is too messed up right now. Would be great if @AngieHinrichs could nuke all sequences in those S:A344X+S:403K branches and see if Usher fixes it on second try.
Would be great if @AngieHinrichs could nuke all sequences in those S:A344X+S:403K branches and see if Usher fixes it on second try.
At least some of the S:A344X amino acid changes in XBB.1.9.2 are caused by different nucleotide mutations, and re-running usher won't join those together...
The S:403K branches do seem to be consistently G22770A (although there's also A22771C, e.g. CHN/HB-Jingzhou-2101/2023). I see a one branch where G22592A is followed by different branches getting G22770A (Germany/BB-RKI-I-1136867/2023, England/PHEC-YYEA518/2023). So maybe there are some sequences that missed G22770A due to Ns or something. I can try the usual prune, opt & replace. That branch is distinct from the branch in this proposal, though (T16548G > G21255T,G22592A vs C29347T > C4686T > A16373G > ...).
On the branch in this proposal, it looks like some sequences got G22592A (S:A344T) and C13767T and then got G22770A (S:R403K), while other sequences got C22593T (S:A344V) and A29125G and then some of those got T24979C and then G22770A (S:R403K). I don't see a problem with this branch... if I'm missing something, please provide some sequence names or IDs that look misplaced.
Sorry @AngieHinrichs, I wasn't as clear as I should have been.
On the surface this issue may be about another lineage, but I suspect there's a good chance this is in fact one and the mess is due to artefacts (not Usher's fault ;) )
Most sequences in XBB.1.9.2 with S:A344 mutated also have S:R403K or at least have unknowns there.
I consider it unlikely that 403 and 344T/V arose homoplasically.
I think nuking and rebuilding would be helpful. Maybe you can seed it first with good quality sequences and place the ones with unknowns later?
@corneliusroemer I can't see the full sequence names of some of those sequences with big blocks of Ns (like Germany/B[BE]-ChVir-LB23012...??), but I don't see names like that in this branch of the UShER tree either -- those sequences might have had too many equally parsimonious placements (EPPs) due to the Ns and been rejected. If you can list specific sequence names/IDs that you think are misplaced (or that have especially high quality and should be kept), that would be very helpful.
@corneliusroemer This is looking cleaner in the 2023-07-19 tree: S:R403K (G22770A) first, then separate branches with S:A344V (C22593T) and S:A344T (G22592A) along with other mutations that distinguish the two branches.
4 more seqeunces on Gisaid from Scotland uploaded today 17 on gisaid likely more than that on Usher
it is not totally overlapping with EG.9 but ok to add a milestone to it.
It is a sibling lineage in #57 first spotted by @FedeGueli
Defining mutations: XBB.1.9.2>C4686T (Orf1a:T1474I) >A16373G (Orf1b:N969S)>C22593T (S:A344V), A29125G GISAID query: C4686T, A16373G, C22593T, A29125G Earliest seq: 2023-05-10 (England, EPI_ISL_17693007) Most recent seq: 2023-06-09 (Scotland, EPI_ISL_17884588) Detected Countries: UK (10)
Usher Tree:![QQ截图20230629214649](https://github.com/sars-cov-2-variants/lineage-proposals/assets/121703496/d99deaae-0f47-49f5-b43d-940a4db49cfc)
https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_2f34b_d770c0.json
4 of 10 seqs further get S:R403K (on 2 branches, one branch also with S:E654A)
Genomes: EPI_ISL_17693007, EPI_ISL_17776830-17776831, EPI_ISL_17796661, EPI_ISL_17820821, EPI_ISL_17884588, EPI_ISL_17884618-17884621,