Closed nspope closed 2 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 89.61%. Comparing base (
d1f81e2
) to head (f7078da
).
@jeromekelleher and @petrelharp this is ready for a look whenever you have time.
Shall I move the core algorithm into the C library in this PR or in a followup?
The remainder
method is very clever. However, it does involve a lot of adding & subtracting of close numbers: is floating point error liable to be an issue? If there are n
samples and a sequence length L
, then a node present for O(1) sequence length will have their coalescing pairs computed as n^2 * (L + O(1)) - n^2 * L
; I suppose this is only a problem if 1 / L
is machine epsilon? So, not a worry?
This looks great - I've made some comments; and perhaps it needs another test case for which some samples might be parents to other samples? (e.g., produced by the nonWF simulator in the test suite, I think?)
I suppose this is only a problem if 1 / L is machine epsilon
Right -- if you've got a node with span S
in a sequence of length L
, you'll lose bits if S / L < machine epsilon
. In practice it seems that this won't happen, and could be detected easily enough. What do you think, @jeromekelleher?
Seems pretty unlikely to me
We discussed changing the name, but I've forgotten to what? And, to be clear, i think Jerome's suggesting modifying this python algorithm, not adding a new one.
This is ready for another look ...
i think Jerome's suggesting modifying this python algorithm, not adding a new one.
But in a followup PR, right?
Any idea why codecov isn't working here @benjeffery? It's making the diff impossible to read here, which is worse than useless.
If you like you can drop this argument from the initial version and make an issue to track?
Thanks @jeromekelleher, it'd be great to sort this API out now as it's very close.
re: naming, @petrelharp brought up that the stats methods will eventually have a time_windows
argument, but that the behavior will be a bit different because pair_coalescence_counts
needs a (default) time discretisation option of "nodes"
as well as an option to pass time window breakpoints. So we could either:
time_windows
-- I think time_bins
is fine (definitely better than time_discretisation
)time_windows
and document that it accepts an additional string option that won't be accepted by other stats methodsEither is fine by me. Do you have a preference @petrelharp?
I think having this method accept an additional argument to time_windows
here would be totally fine. It would be more confusing to use a different name just to avoid that minor bit of inconsistency I think.
Any idea why codecov isn't working here @benjeffery? It's making the diff impossible to read here, which is worse than useless.
I'm seeing a "Commit YAML is invalid" error at codecov - checking it out.
@mergifyio rebase
rebase
@nspope I've rebased here to fix CI so you won't be able to push changes without a reset. Not sure how much of a git ninja you are so let me know if you need to push changes.
I think having this method accept an additional argument to time_windows here would be totally fine. It would be more confusing to use a different name just to avoid that minor bit of inconsistency I think.
Sounds good to me!
Thanks @benjeffery! Looks like codecov is failing lwt-tests and python-c-tests -- I don't think that's due to this PR?
Thanks @jeromekelleher -- I've modified so that time_windows
argument accepts either "nodes"
or a sorted array of breakpoints. Nodes that fall outside of time_windows
aren't counted in the output.
Darn, it looks like codecov is still mangling the diff, though this is rebased onto 9e1bf0 ? Sorry Ben, maybe rebasing / pushing on my end screwed something up?
So the comment in https://github.com/tskit-dev/tskit/pull/2915#issuecomment-2029229828 is correct - only one line is missing coverage. However, previous comments made by codecov are not removed :( I think the only way to clear those would be to open a new PR.
Hmm, now the post-test Codecov upload is erroring out:
[2024-04-10T04:01:34.167Z] ['verbose'] The error stack is: Error: Error uploading to https://codecov.io: Error: There was an error fetching the storage URL during POST: 404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}
I'll go ahead and open a new PR ...
Hmm, now the post-test Codecov upload is erroring out:
[2024-04-10T04:01:34.167Z] ['verbose'] The error stack is: Error: Error uploading to https://codecov.io: Error: There was an error fetching the storage URL during POST: 404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}
I'll go ahead and open a new PR ...
Yeah, we sometimes get that one. I spent a while investigating it and it seems to be transient on codecov's end.
@benjeffery it may be related to v3 codecov-actions deprecation, see comment. Happy to wait and see if it resolves itself though
Well-- bumping codecov-actions to v4 got rid of the upload errors, but the coverage reports don't seem to be making their way here.
@benjeffery Sorry for the hassle ... but any chance you could clone and force-push this to re-trigger codecov? To see if the issue might be related to this PR coming from a fork? It seems like the coverage report is working fine in #2924
@mergifyio rebase
rebase
Codecov issue now seems fixed here - it has removed all it's comments about uncovered lines.
Are we good to merge then @nspope?
Yes! Thanks for the fix @benjeffery
Core algorithm, tests and API that partially address #2904