tskit-dev / tskit

Population-scale genomics
MIT License
153 stars 72 forks source link

find_ibd() breaks up longer segments of IBD #1479

Closed Arslan-Zaidi closed 3 years ago

Arslan-Zaidi commented 3 years ago

Hi all,

I've been working with find_ibd() to extract IBD segments from the tree sequence and it works great. However, I noticed that the function splits longer IBD segments, producing an overrepresentation of shorter (contiguous) segments of IBD. This occurs because even though the tree topology changes across a recombination event, it may not have changed with respect to the two haplotypes under consideration. This is not an issue and is easily fixed downstream by merging contiguous segments into one but it might be helpful to add something to the documentation. Alternatively, perhaps it might be useful for find_ibd to do this on the fly?

Thanks, Arslan

jeromekelleher commented 3 years ago

@gtsambos, any thoughts here? I guess we should just document (#1480) and explain the output there.

gtsambos commented 3 years ago

Hi @Arslan-Zaidi, thanks for bringing this up! I think that the behaviour you're describing here is what was intended for this function -- it assumes that both the identity of the common ancestor, and the genealogical path between ancestors and descendants, are important. You're right that we should document this more clearly in the first instance. Later on, it would be nice to add some functions to post-process these segments to accomodate for these slight differences in IBD definitions -- but it sounds like you've something like this working already for your purposes.

Arslan-Zaidi commented 3 years ago

Hi @gtsambos! Yes, I've got a working solution. Just wanted to point this out because I didn't realize it until I compared to theoretical expectations. Thanks again for the super useful function!

benjeffery commented 3 years ago

Looks like the answer here is to complete #1480, so I'll close this.

Thanks for raising the issue @Arslan-Zaidi - please re-open or file a new issue, or start a discussion as appropriate if you have further questions.