Open hyanwong opened 1 year ago
Good spot!
Oho, so I guess it's taking a Delta sequence and recombining it back to an early B.1.
We could cheat initially, and pull out this recombinant Delta from both trees to see what happens?
I'm not sure that anything changes re topology if you simply remove the nodes? I mean, the recombination points will appear more recent, but that's just photoshopping the ARG. It would be better IMO to see if we think this is a "bad" recombination node, and what the causal strain is that is pushing it so deep.
I'm not saying we do this for the publication - just so we see what happens for our own information
Yes: I'm not quite sure what you mean by "pull out this recombinant Delta from both trees", though?
Figure out which sample is causing this recombination, and remove it from the list of samples used when we simplify down the trees. Maybe it's not just one sample, though?
Maybe it's not that simple though, and it's not the samples we have are the direct cause of the recombinant (node) in question?
Many of these recombinants are the influential false positives discussed in section 2.8 of the preprint. It would be useful to classify the recombination nodes here as in table 3.
The notebook at https://github.com/jeromekelleher/sc2ts-paper/blob/main/notebooks/s2.2_cophylo_recombinants.ipynb can be used to create this useful plot of the ARG, restricted to the relevant nextstrain-subset nodes:
Here's a bit of the cophylogeny at the start of the long ARG: sc2ts tree on the left
and here it is at pos 22000:
Although there are only 5 or so recombinations in the simplified ARG, one is clearly causing the sequences at the root of Delta to be repositioned. We should try to work out which recombination node causes this, and what's going on there.