sars-cov-2-variants / lineage-proposals

Repository to propose and discuss lineages
43 stars 2 forks source link

Need help with weird UShER placement on a few lineages (XBB.1.5* & XBB.1.5.77 with S:T478R) (15/28 seqs) #323

Closed NkRMnZr closed 1 year ago

NkRMnZr commented 1 year ago

Bumping into some really weird UShER (mis-)placement, starting with #305, and then #314 by @aviczhl2 , now there's another:

when I found some South African lineages with T478R, it is placed in that particular weird XBB.1.5's T17124C > T10204C > T24845C > T23018C > C24845T flip-flop branch, again: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_19fd4_44a010.json?c=gt-S_478&f_userOrOld=uploaded%20sample&showBranchLabels=all misplaced

however, if using query T10204C,C27005T, C22995G to try finding those seqs, which will fail to catch them, and those sequences with that query actually placed under a XBB.1.5.77's sub-branch, follows:XBB.1.5.77 > C24706T > C16575T > G10364A(ORF1a:V3367I) > C679T, G1820A(ORF1a:G519S), C12534T(ORF1a:T4090I), C22995G(S:T478R), C28567T, A29147G(N:I292V), T29515A

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_1dfa2_44c6b0.json?f_userOrOld=uploaded%20sample 1 5 77

My theory is that either something wrong with UShER, or some kind of drop-out causing the issue. Could they be the same lineage, just split by good sequence and potential artifact? Or are these some kind of recombinant? I'm totally confused.

Genomes for those from South Africa (and the USA) on weird flip-flop branch, having trouble with query too South_Africa/NICD-N55321/2023|EPI_ISL_17803355|2023-05-16 South_Africa/NICD-N55462/2023|EPI_ISL_17801595|2023-05-24 South_Africa/NICD-N55457/2023|EPI_ISL_17801590|2023-05-10 South_Africa/NICD-R00766/2023|EPI_ISL_17885142|2023-01-24 South_Africa/NICD-R00768/2023|EPI_ISL_17885140|2023-01-26 USA/AZ-CDC-QDX49328040/2023|EPI_ISL_17589852|2023-04-20 USA/CA_SACPHL_23SAC0130/2023|EPI_ISL_17738100|2023-04-24 USA/CA-HLX-STM-6G94T93CM/2023|EPI_ISL_17688448|2023-04-23 USA/FL-CDC-QDX80295151/2023|EPI_ISL_17854786|2023-05-08 USA/NV-CDC-QDX49006504/2023|EPI_ISL_17526702|2023-04-09 USA/NV-CDC-QDX49399559/2023|EPI_ISL_17621371|2023-04-22 USA/NV-CDC-QDX80338427/2023|EPI_ISL_17852406|2023-05-11 USA/TX-CDC-QDX80414718/2023|EPI_ISL_17852352|2023-05-11 many has a T insertion after 28214 ![insertion](https://github.com/sars-cov-2-variants/lineage-proposals/assets/125747944/fb78d808-2212-4cb5-83e8-fb2804917849)
Genomes on XBB.1.5.77 with S:T478R mini-saltation branch, query: C22995G, A29147G, T29515A, 15 seqs from Costa Rica & USA CostaRica/INC-11215-793027/2023|EPI_ISL_17774089|2023-05-15 USA/CA-HLX-STM-465DV7A9F/2023|EPI_ISL_17950673|2023-06-06 USA/CA-HLX-STM-4ZZNXKJVS/2023|EPI_ISL_17950697|2023-06-06 USA/CA-HLX-STM-6J6ENP96P/2023|EPI_ISL_17950529|2023-06-01 USA/FL-CDC-LC1044327/2023|EPI_ISL_17854347|2023-06-07 USA/FL-CDC-QDX81328328/2023|EPI_ISL_17856472|2023-06-05 USA/FL-CDC-QDX81328331/2023|EPI_ISL_17856533|2023-06-06 USA/HI-H2322559/2023|EPI_ISL_17951437|2023-06-05 USA/HI-H2322625/2023|EPI_ISL_17951505|2023-06-12 USA/HI-H2322717/2023|EPI_ISL_17951589|2023-06-15 USA/TX-CDC-QDX81456301/2023|EPI_ISL_17856780|2023-06-09 Costa Rica/INC-11487-796517/2023|EPI_ISL_17953750|2023-06-10 Costa Rica/INC-11526-797304/2023|EPI_ISL_17953789|2023-06-08 Costa Rica/INC-11498-796699/2023|EPI_ISL_17953796|2023-06-09 Costa Rica/INC-11528-797306/2023|EPI_ISL_17953784|2023-06-09 ![branch2qc](https://github.com/sars-cov-2-variants/lineage-proposals/assets/125747944/be8aca37-3f6b-4253-87b5-f1b68a9e119d)
FedeGueli commented 1 year ago

I dont think #305 is related with this misplacement. I have already pointed to @angiehinrinchs that there is a B tree of XBB.1.5 misplaced by usher to XBB.1

NkRMnZr commented 1 year ago

I dont think #305 is related with this misplacement. I have already pointed to @angiehinrinchs that there is a B tree of XBB.1.5 misplaced by usher to XBB.1

There's another one #288 with C15024T/T15024C flip-flop

NkRMnZr commented 1 year ago

Updates:

Branch 1

now using query: T10204C,C27005T, C22995G,-T18732C, still fail to catch 5 South African seqs

23-07-20: 1 sequence from California EPI_ISL_17997578

Branch 2

now using query: C22995G, A29147G, T29515A

23-07-08: 1 seq from Hawaii EPI_ISL_17961099
23-07-13: 2 seqs from USA 1 from Florida: EPI_ISL_17979207 1 from Alaska: EPI_ISL_17981706
23-07-18: 7 seqs from 3 countries https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_399bb_65af10.json?f_userOrOld=uploaded%20sample ![update1](https://github.com/sars-cov-2-variants/lineage-proposals/assets/125747944/cc18006a-b86f-4c31-9783-b2d0e562dec4) 2 from California, USA: EPI_ISL_17985447, EPI_ISL_17985504, with travel history to El Salvador 1 from Florida, USA: EPI_ISL_17986287 1 from Shanghai, China: EPI_ISL_17994414 3 from Costa Rica: EPI_ISL_17989878, EPI_ISL_17990028
23-07-20: 3 seqs from USA ![update4](https://github.com/sars-cov-2-variants/lineage-proposals/assets/125747944/592078a3-fb1c-4191-aae6-f8a74494b08f) EPI_ISL_17998483, EPI_ISL_17998499, EPI_ISL_18005324
FedeGueli commented 1 year ago

Branch 2 designated HR.1 via https://github.com/cov-lineages/pango-designation/commit/425714014f29d5f4288d9f3783debe09728d15ed

FedeGueli commented 1 year ago

Please keep the other monitored if it grows please ping me.

NkRMnZr commented 1 year ago

Please keep the other monitored if it grows please ping me.

I've got another theory, branch 1 could be XBB.1.5 [T17124C > T23018C > T10204C] > C22995G(S:T478R) that got misplaced. There's S:T478R right under T17124C polytomy which is assigned as XBB.1.5.28, this one could be S:T478R right under T17124C > T10204C branch, or simply having dropouts somewhere.

It is really hard to catch T17124C > T10204C > C22995G since all these three mutations are very homoplastic, however if using query: T17124C, T10204C, C22995G, T23602C to catch the largest sub-branch right after C22995G, there will be 46 sequences, plus 10-15-ish missed by this query: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice50_genome_a0dd_ffc520.json?c=gt-nuc_23602 23602

do you think this worth a designation/proposal?

FedeGueli commented 1 year ago

thx @NkRMnZr samples are quite old i would keep it unproposed if nothing will change.

NkRMnZr commented 1 year ago

Here's what Branch 1 look like after recent batch from South Africa https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice2_genome_1b23a_59e0e0.json?c=gt-S_478&label=id:node_7079334 image

FedeGueli commented 1 year ago

Please rearrange the proposal on Branch 1

NkRMnZr commented 1 year ago

Please rearrange the proposal on Branch 1

Problem is, it's almost impossible to catch them by query, quite a tricky one.

FedeGueli commented 1 year ago

and what about proposing the two main branches of branch 1 as two distinct lineages.??

FedeGueli commented 1 year ago

i think we can close this one.