tskit-dev / tscompare

Utilities for comparing tree sequences
MIT License
2 stars 0 forks source link

A nice idea for accuracy of inference at different timepoints #2

Open hyanwong opened 4 years ago

hyanwong commented 4 years ago

Anders Eriksson suggested a nice way of testing whether our inference methods do well or poorly for different heights in the TS.

We use the (infinite sites) mutations to identify corresponding edges in the true and the inferred TS. Then (since we are guaranteed that the tips under each are the same), we can calculate a topology difference between the subtrees rooted at that node.

petrelharp commented 4 years ago

Nice. This gives us a way of identifying nodes also - nodes = ancestral haplotypes, and are mutations that originated in a given haplotype in one tree sequence, are they in the same in another.

hyanwong commented 4 years ago

Another possibility, as just discussed with Michelle Kendell, and particularly useful for tsinfer, where we have a known (simulated) TS with branch lengths and an inferred topology with arbitrary lengths. We take all nodes from the known topology that exist between certain timepoints, and select all the pairwise differences (with left-right coords if >1 tree) that split on this node. We then calculate a topology-only pairwise distance metric (e.g. KC) based on only those pairs over that portion of the genome.