Open ktmeaton opened 11 months ago
rebar dataset download --name sars-cov-2 --tag 2023-12-06 --output-dir dataset/sars-cov-2/2023-12-06
What is the evidence for BA.1.15
as a secondary parent?
BA.1.15
region is 22578-25469
(2.9 kb), with 28 mutations in support and no conflict ALT or REF bases.B.1.617.2
regions are 210-21762
and 25584-29402
, with 10 mutations in support and 18 conflict ALT bases.rebar run \
--dataset-dir dataset/sars-cov-2/2023-12-06 \
--output-dir output/sars-cov-2/2023-12-06/XD \
--verbosity debug \
--populations "XD" \
--parents "B.1.617.2,BA.1.15"
NonRecursiveRecombinant: score=20, conflict=18
210-21762
: B.1.617.2
, 22578-25469
: BA.1.15
, 25584-29402
: B.1.617.2
score:
- B.1.617.2: -8
- BA.1.15: 28
support:
- B.1.617.2 (10): G210T, G15451A, C16466T, C21618G, T26767C, T27638C, C27752T, A28461G, G28881T, G29402T
- BA.1.15 (28): G22578A, T22673C, C22674T, T22679C, C22686T, G22813T, T22882G, G22898A, G22992A, C22995A, A23013C, A23040G, G23048A, A23055G, A23063T, T23075C, C23202A, A23403G, C23525T, T23599G, C23604A, C23854A, G23948T, C24130A, A24424T, T24469A, C24503T, C25000T
conflict_ref:
- B.1.617.2 (0):
- BA.1.15 (0):
conflict_alt:
- B.1.617.2 (18): A1321C, G4181T, C6402T, C7124T, C7851T, A8723G, C8986T, G9053T, A11201G, A11332G, C14407T, T15264C, C19220T, G21641T, C25667T, G25855T, C27874T, G28916T
- BA.1.15 (0):
private:
- B.1.617.2 (18): A1321C, G4181T, C6402T, C7124T, C7851T, A8723G, C8986T, G9053T, A11201G, A11332G, C14407T, T15264C, C19220T, G21641T, C25667T, G25855T, C27874T, G28916T
- BA.1.15 (0):
rebar plot --annotations dataset/sars-cov-2/2023-11-30/annotations.tsv --run-dir output/sars-cov-2/2023-12-06/XD --all-coords
I want to write documentation about how the algorithm works (ex. run.md) with a case study. SARS-CoV-2 recombinant
XD
often confuses me, so I'll work through some of the results here.XD
is designated asB.1.617.2*
andBA.1*
.B.1.617.2*
, as only about a \~3-5 kb section comes from a secondary parent.XD
samples were classified as Delta 21J. However, the UShER phylogeny has them placed asBA.1.15
descendants. Probably because the \~3-5 kb is in the Spike, which is so mutation-rich.rebar
thinksB.1.617.2
andXS
have more support.rebar
or our prior knowledge? (Let's assumerebar
for now, to critique the method)BA.1
,B.1.617.2
): score=20, conflict=18BA.1
,B.1.617.2*
consensus of variousAY.*
,BA.1
): score=41, conflict=8XD
,???
): No evidenceXS
,B.1.617.2*
consensus of variousAY.*
) score=35, conflict=7These results tell me that:
B.1.617.2
strict, a consensus of variousAY.*
has way higher scores/less conflict.BA.1
,B.1.617.2
) seems like it should be "best", with the highest score (41) and almost the lowest conflict (8).rebar
prefers the hypothesis that minimizes conflict, rather than maximum support. This is whyKnockoutRecombinant
withXS
was being picked as best. This decision needs to be re-assessed, as I never liked it in the first place.min_conflict
strategy was originally developed to deal withXBB*
recursive recombinants. Because often the original recombination (XBB=BJ.1 and CJ.1) would have the highest support but a LOT of conflict.