Closed maxbla closed 4 years ago
A few drive-by notes: raw_distance
should only ever take sorted slices as inputs; that might be worth adding to the doc string in case it's not clear. I'm not sure NaN
is necessarily wrong there, but 1 is probably better since that matches intuition (the other alternative is to throw an error, but that seems like overkill and would change the method signature).
But yeah, I think the changes make the code a lot simpler to read and the test cases handle all the regressions/weird edge cases we were worried about in 4af8ad9986bdb58b8ee2b6c96c32b911df71ad07 so this should be safe to merge.
Great @bovee I was about to ping you since I wasn't sure whether it was meant to take sorted slices! How do you feel about returning a struct rather than a tuple of 4 values?
I think a struct (maybe called SetStastics
?) would be more readable, and I like the idea. The only issue is that raw_distance
is a pub fn
, meaning any change to it is a breaking change (so it would also need a semver increment). I don't know what other code at One Cocex depends on finch-rs
, and suspected a breaking change might, well, break it. It might make more sense to merge this PR as-is for now, and make multiple breaking changes all at once.
We rely on crates.io internally for this crate so it could be worth a 0.4 Have you seen other potential breaking changes?
We rely on crates.io internally for this crate so it could be worth a 0.4
Good to know. In that case, a breaking change with a version bump wouldn't break properly versioned reverse dependencies.
Have you seen other potential breaking changes?
As this PR currently stands, there are no breaking changes. I would like to close #48, which would could be rolled up into a 0.4 release. I would need help to do so though because I can't create repositories in the onecodex github orginization.
Is this ready for merge, or should I add a SetStastics
struct?
Sorry, I lost track of that issue.
I added a property test that
raw_distance(sequence1, sequence2, 0.) == raw_distance(sequence2, sequence1, 0.)
, and it failed at first. Then I refactored the raw_distance function to have broadly similar behavior (still passes all tests), but use more idiomatic Rust, and be shorter and simpler. In doing so, I managed to make it pass the commutativity test. This doesn't exactly fix a problem (see below) so I'll understand if it is not merged.On what case did the old raw_distance function fail?
raw_distance(&[2, 0], &[1], 0.) == (0.0, NaN, 0, 0)
andraw_distance(&[1], &[2, 0], 0.) == (0.0, NaN, 0, 0)
. SinceNan != Nan
this technically fails the commutativity test, although the practical use of that result is questionable. Also I thinkraw_distance
is supposed to take only sorted slices as inputs (but couldn't find any documentation saying so), so the commutativity proptest might be completely invalid.