It would be fun to set up an msprime simulation (with appropriate timescales) of e.g. humans, chimps, bonobos, the 2 gorilla species, and the 3 orang-utans, then see if we can back-infer the gene trees correctly. It's my guess that the majority of phylogenetic conflict comes from ILS, so I think we should be able to use tsinfer for this sort of thing. The difficulty, of course, is incorporating into the simulations the sort of genetic changes that occur on these timescales (large genomic rearrangements, recurrent mutations, some elements of selection, weird intermediate demographics & subspecies, etc).
With real data, it'll be the alignment step that is important here, so you might need a version of tsinfer that can cope with multiple chunks of shorter sequences.
It would be fun to set up an msprime simulation (with appropriate timescales) of e.g. humans, chimps, bonobos, the 2 gorilla species, and the 3 orang-utans, then see if we can back-infer the gene trees correctly. It's my guess that the majority of phylogenetic conflict comes from ILS, so I think we should be able to use tsinfer for this sort of thing. The difficulty, of course, is incorporating into the simulations the sort of genetic changes that occur on these timescales (large genomic rearrangements, recurrent mutations, some elements of selection, weird intermediate demographics & subspecies, etc).
With real data, it'll be the alignment step that is important here, so you might need a version of tsinfer that can cope with multiple chunks of shorter sequences.