jeromekelleher / sc2ts

Infer a succinct tree sequence from SARS-COV-2 variation data
MIT License
4 stars 3 forks source link

Nextstrain-subset ARG contains recombination which majorly affects delta time origin #127

Open hyanwong opened 1 year ago

hyanwong commented 1 year ago

Here's a bit of the cophylogeny at the start of the long ARG: sc2ts tree on the left

Screenshot 2023-03-10 at 11 30 04

and here it is at pos 22000:

Screenshot 2023-03-10 at 11 29 24

Although there are only 5 or so recombinations in the simplified ARG, one is clearly causing the sequences at the root of Delta to be repositioned. We should try to work out which recombination node causes this, and what's going on there.

jeromekelleher commented 1 year ago

Good spot!

Oho, so I guess it's taking a Delta sequence and recombining it back to an early B.1.

We could cheat initially, and pull out this recombinant Delta from both trees to see what happens?

hyanwong commented 1 year ago

I'm not sure that anything changes re topology if you simply remove the nodes? I mean, the recombination points will appear more recent, but that's just photoshopping the ARG. It would be better IMO to see if we think this is a "bad" recombination node, and what the causal strain is that is pushing it so deep.

jeromekelleher commented 1 year ago

I'm not saying we do this for the publication - just so we see what happens for our own information

hyanwong commented 1 year ago

Yes: I'm not quite sure what you mean by "pull out this recombinant Delta from both trees", though?

jeromekelleher commented 1 year ago

Figure out which sample is causing this recombination, and remove it from the list of samples used when we simplify down the trees. Maybe it's not just one sample, though?

jeromekelleher commented 1 year ago

Maybe it's not that simple though, and it's not the samples we have are the direct cause of the recombinant (node) in question?

hyanwong commented 1 year ago

Many of these recombinants are the influential false positives discussed in section 2.8 of the preprint. It would be useful to classify the recombination nodes here as in table 3.

The notebook at https://github.com/jeromekelleher/sc2ts-paper/blob/main/notebooks/s2.2_cophylo_recombinants.ipynb can be used to create this useful plot of the ARG, restricted to the relevant nextstrain-subset nodes: