hyanwong / treeseq-inference

Work for the tree sequence inference paper.
Apache License 2.0
0 stars 0 forks source link

tsinfer for phylogenetics #13

Open hyanwong opened 5 years ago

hyanwong commented 5 years ago

It would be fun to set up an msprime simulation (with appropriate timescales) of e.g. humans, chimps, bonobos, the 2 gorilla species, and the 3 orang-utans, then see if we can back-infer the gene trees correctly. It's my guess that the majority of phylogenetic conflict comes from ILS, so I think we should be able to use tsinfer for this sort of thing. The difficulty, of course, is incorporating into the simulations the sort of genetic changes that occur on these timescales (large genomic rearrangements, recurrent mutations, some elements of selection, weird intermediate demographics & subspecies, etc).

With real data, it'll be the alignment step that is important here, so you might need a version of tsinfer that can cope with multiple chunks of shorter sequences.