tskit-dev / tsinfer

Infer a tree sequence from genetic variation data.
GNU General Public License v3.0
56 stars 13 forks source link

Document (and expose?) build_simulated_ancestors #653

Open hyanwong opened 2 years ago

hyanwong commented 2 years ago

For testing purposes, it can be useful to run an inference with "perfect ancestors". For example, I suggested this as a route to @szhan to see if it is mainly the ancestor generation step in tsinfer that is causing problems for imputation accuracy.

We can build perfect ancestors using build_simulated_ancestors in eval_util.py. It might be useful to document and possibly expose this function, and give an example use-case?

For testing purposed, It would also be nice to extend the function to create slightly longer ancestors, filling in some of the flanking regions using the normal ancestor builder. This would be a way to ensure that we weren't unintentionally giving the algorithm any extra clues as to the position of breakpoints.

hyanwong commented 2 years ago

@szhan - you might like to have a go to see if build_simulated_ancestors() even works for you. Until we figure out https://github.com/tskit-dev/tsinfer/issues/11, you'll need to test it on pure SMC tree sequences, though (run msprime with model="smc"). If you do try it, report back in this issue if you have problems.