Closed lkirk closed 5 months ago
@jeromekelleher I'm still seeing issues with the build cache. Is there anything I can do to help here? I'm not in a huge rush for what it's worth, just wanted to close this out.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 89.62%. Comparing base (
3349fdd
) to head (343a89b
).
@benjeffery any advice here?
@benjeffery any advice here?
I'm in progress with this at #2935
We've pushed through a few changes in the CI setup, can you rebase now please @lkirk? Hopefully things will go a bit more smoothly now.
Thank you @benjeffery for fixing that!
@jeromekelleher This is ready now, I chopped many more slow tests from this. I realized that there are only 2 cores being used on the test servers and I hadn't anticipated how the slow tests would compound in runtime. As written, things were taking ~1.5 hours and I've adjusted things so now only important tests are considered, adding about 2 minutes to the test runtime.
Does this seem reasonable?
2 minutes is still quite a bit. You could just use the naive version on a handful of small cases to verify things are lining up as expected, and defer doing tests on the full range of examples until the C version is in place?
Any slow tests should be marked with @pytest.mark.slow
.
@jeromekelleher Agreed, I've cut a couple more tests to reach a minimal subset and now the runtime is the same -- at least in the number of minutes, I did a quick check of the runtimes on main and compared to these and everything seems to be running in the same amount of time, within a minute at least. Slowest added test runs in ~.2 seconds on my relatively fast computer.
Here's the times from the latest run on main:
So, it appears that the difference in runtime is sub-minute now. Is that reasonable?
If not, I can just mark as slow.
@mergifyio rebase
rebase
I've re-opened a PR to avoid some issues with build caching. Please see the original discussion on #2912.
Currently, this algorithm creates a matrix of LD, performing a pairwise comparison of all trees in the tree sequence.
This implementation lacks windows/positions, sample sets and polarisation. The outputs of the code produce results in units of branch length, needing to be multiplied by mu^2 or divided by product of the total branch length of the two trees.
This algorithm works by keeping a running sum of the statistic between two trees, updating each time we encounter a branch addition or removal. The tricky part is that we have to remove or add LD contributed by samples that already existed or that will remain under a given node after the addition/removal of branches.
We include a validation against the original formulation of this problem, by including an implementation that was described in McVean