hyanwong / treeseq-inference

Work for the tree sequence inference paper.
Apache License 2.0
0 stars 0 forks source link

adding unphased (e.g. ancient) DNA #7

Open hyanwong opened 5 years ago

hyanwong commented 5 years ago

From @hyanwong on January 22, 2018 15:41

After Wilder's talk on 22nd Jan, I think I've had a great idea for how to incorporate aDNA into tsinfer. The problem is that it is rarely phased. But this need not be a problem, because even unphased variants can be used to put a time and place on any one particular variant in the tree. I.e. think of each tree separately, and use each ancient variant to constrain the place/time of nodes on the tree.

What can't be done with unphased data is infer recombinations, but maybe that is best done from modern samples anyway.

One problem is to incorporate ancient variants that are not present in current day populations. This can probably only be done by having at least one (do we require more?) phased aDNA samples. But if we do have one of these, it might mean that we can incorporate ancient variants from unphased samples e.g. to date parts of the tree, as before.

There is some underlying logical argument here which is interesting about the information contained in phased vs unphased data in terms of ancestry / tree reconstruction. To be investigated (Jason might be interested...)

Copied from original issue: mcveanlab/treeseq-inference#44