Closed lkirk closed 2 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 89.63%. Comparing base (
972308e
) to head (7203f77
).
Strange result from mergify, I don't think this issue has cropped up before, but I logged in with github and gave it access to my account info. I'm not sure if there's anything else to do on my end.
@mergifyio rebase
rebase
This change incorporates sample set functionality by adding to the data structure that is tracking the samples under each node. This is different from how we do things in the two-site statistics, where we first obtain every sample under every node, then intersect with the sample sets. Since we're doing a branch update algorithm, we want to be able to update the branches without having to intersect our sets with the sample sets every time we add or remove a branch. This would be very expensive because we iterate over every branch in a fixed (fully materialized) tree when we add or remove a branch from the modified tree.
In doing this, we also update the summary functions to be compatible with the existing site statistics code, so now we have unbiased estimators for pi2, Dz, and D2. We'll worry about testing these in sites when we implement the C versions.
These changes also include a correctness fix for the orthogonal "McVean" prototype. This allows us to compute LD for samples that do not have MRCAs.
All tests now agree between the prototype and the proposed branch algorithm, but I've still excluded the slower tests.
Overall, I think this is the majority of the complexity I plan to add to this algorithm. The next feature will be position selection, which should not introduce much more complexity.