Add sample sets to the python branch LD prototype

lkirk commented 2 months ago

This change incorporates sample set functionality by adding to the data structure that is tracking the samples under each node. This is different from how we do things in the two-site statistics, where we first obtain every sample under every node, then intersect with the sample sets. Since we're doing a branch update algorithm, we want to be able to update the branches without having to intersect our sets with the sample sets every time we add or remove a branch. This would be very expensive because we iterate over every branch in a fixed (fully materialized) tree when we add or remove a branch from the modified tree.

In doing this, we also update the summary functions to be compatible with the existing site statistics code, so now we have unbiased estimators for pi2, Dz, and D2. We'll worry about testing these in sites when we implement the C versions.

These changes also include a correctness fix for the orthogonal "McVean" prototype. This allows us to compute LD for samples that do not have MRCAs.

All tests now agree between the prototype and the proposed branch algorithm, but I've still excluded the slower tests.

Overall, I think this is the majority of the complexity I plan to add to this algorithm. The next feature will be position selection, which should not introduce much more complexity.

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 89.63%. Comparing base (972308e) to head (7203f77).

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #2941 +/- ## ======================================= Coverage 89.63% 89.63% ======================================= Files 29 29 Lines 30184 30184 Branches 5875 5875 ======================================= Hits 27056 27056 Misses 1789 1789 Partials 1339 1339 ``` | [Flag](https://app.codecov.io/gh/tskit-dev/tskit/pull/2941/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tskit-dev) | Coverage Δ | | |---|---|---| | [c-tests](https://app.codecov.io/gh/tskit-dev/tskit/pull/2941/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tskit-dev) | `86.20% <ø> (ø)` | | | [lwt-tests](https://app.codecov.io/gh/tskit-dev/tskit/pull/2941/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tskit-dev) | `80.78% <ø> (ø)` | | | [python-c-tests](https://app.codecov.io/gh/tskit-dev/tskit/pull/2941/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tskit-dev) | `88.72% <ø> (ø)` | | | [python-tests](https://app.codecov.io/gh/tskit-dev/tskit/pull/2941/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tskit-dev) | `99.03% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tskit-dev#carryforward-flags-in-the-pull-request-comment) to find out more.

lkirk commented 2 months ago

Strange result from mergify, I don't think this issue has cropped up before, but I logged in with github and gave it access to my account info. I'm not sure if there's anything else to do on my end.

jeromekelleher commented 2 months ago

@mergifyio rebase

mergify[bot] commented 2 months ago

rebase

tskit-dev / tskit

Add sample sets to the python branch LD prototype #2941

Codecov Report

✅ Branch has been successfully rebased