tskit-dev / tskit

Population-scale genomics
MIT License
147 stars 70 forks source link

Add support for MRCA based IBD #2896

Open jeromekelleher opened 5 months ago

jeromekelleher commented 5 months ago

The MRCA definition of IBD is useful in some cases, and it would be good for us to support it.

From an API perspective, I think all that we need to do is provide a definition="path"|"mrca" option to ts.ibd_segments. All the other options should be compatible.

Do you think this works from a user perspective @gtsambos?

I'll have a think about how to implement it, but I'm hoping that it'll be fairly straightforward from reusing the infrastructure developed for the divergence_matrix (e.g. #2710)

jeromekelleher commented 3 months ago

Note, documentation can refer to this preprint for MRCA definition.

gtsambos commented 3 months ago

Hi Jerome, sorry I didn't spot this at the time! I think this would be a nice addition, and the API makes sense.

I'm guessing that the most complicated part will be figuring out how to keep the min_length option in this setting. With a left-to-right algorithm, you'll have to keep track of/keep appending to the ends of each IBD segment until you reach the right-most edge of the segment, because only then will you know whether the segment is large enough to be recorded.

With the existing (backwards-time) algorithm, I think this will be even more complicated, possibly too complicated to implement nicely, because a small IBD segment wrt to a recent common ancestor might get 'fused' to another and thus become larger the further back in time you look.