sstadick / rust-lapper

Rust implementation of a fast, easy, interval tree library nim-lapper
https://docs.rs/rust-lapper
MIT License
53 stars 7 forks source link

Specify Range Inclusivity and Exclusivity #19

Open zaporter opened 1 year ago

zaporter commented 1 year ago

I have been using Lapper quite a bit and have realized that there is one area of the API that seems a bit confusing.

Consider

let lap: Lapper<usize,usize> = Lapper::new(vec![
    Interval{start:0, stop:10, val:0},
    Interval{start:20, stop:30, val:1},
]);
assert_eq!(lap.find(10, 20).count(),0);

The documentation for find() states that it Find all intervals that overlap start .. stop If I interpret that as [start,stop) then it is consistent with the definition of Interval:

/// Represent a range from [start, stop)
/// Inclusive start, exclusive of stop
pub struct Interval<I, T>

However, right now I feel like I am walking on eggshells every time I make a query because I have to ensure I am thinking about these inclusivity bounds. Something like the following would take a lot of load off the programmer.

let lap: Lapper<usize,usize> = Lapper::new(vec![
    Interval{start:Inclusive::from(0), stop:Inclusive::from(10), val:0},
    Interval{start:20, stop:30, val:1},
]);
assert_eq!(lap.find(Inclusive::from(10), 20).count(),1); // should just include the first one
assert_eq!(lap.find(10, Inclusive::from(20)).count(),0); // should include neither as second is not inclusive at the start
assert_eq!(lap.find(10, 21).count(),1); // should include second

/\ This is just a quick demo. I am not attached to anything like this and think it is a bit ugly. (I also want to make sure that anyone who upgrades their version is not surprised by new behavior)

You're a more experienced Rust programmer: do you know a more idiomatic way to do this? Are you at all interested in merging something that would allow for explicit inclusivity/exclusivity?

I will happily write all of the code and do all of the leg work to get this merged if we can agree on an API design.

zaporter commented 1 year ago

Edit: This would also enable a reasonable find_point(val:I)->Intervals that include val because find(k,k) doesn't always give obvious results depending on where k is in the interval

sstadick commented 1 year ago

Hi Zack!

I see what you mean, and I think I take it for granted that a "start inclusive, stop exclusive" range is front of mind for everyone since I'm coming at this from a genomics background where that is pretty common.

I would be interested in changing things to allow the user to define the inclusiveness of the range, but I'm not sure how to do that without a performance regression. I'd probably look to have an interval be Interval { range: R, val: T } where R implements https://doc.rust-lang.org/std/ops/trait.RangeBounds.html (or some other base Range trait that I'm maybe not finding right now).

RE your edit - is the main purpose of the range changes just to allow for searching for points specifically? I'd be much more inclined to add just find_point than to rework the inclusiveness. I think it would be worth a bit of effort to explore stabbing interval queries to see if we can be a bit more efficient than just the binary search. I'd be open to PRs for this doing either the obvious thing and using the internal iterators / binary search or something more complicated.

Sorry I've been slow to respond, I've had a lot going in IRL the last few weeks (and upcoming few weeks as well)!

On Sun, Sep 18, 2022 at 5:48 PM Zack @.***> wrote:

Edit: This would also enable a reasonable find_point(val:I)->Intervals that include val because find(k,k) doesn't always give obvious results depending on where k is in the interval

— Reply to this email directly, view it on GitHub https://github.com/sstadick/rust-lapper/issues/19#issuecomment-1250413667, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTGZHL3PTLJ4AM65BRIKM3V66S4PANCNFSM6AAAAAAQPIYPGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

zaporter commented 1 year ago

I'm glad to hear that you are open to this idea! Thank you for mentioning RangeBounds, I have not used them before. I will play around a bit with the repo and get back to you when I have the start of a reasonable implementation.

Re 'find_point' => No, this is not just for 'find_point'. If done properly, I hope that this might also allow for relaxation of the type bounds for I as well as a more general interface for interval trees.

I will experiment and get back to you (with benchmarks to assess performance differences).

I'm very thankful for your time. I hope all is well!

zaporter commented 3 months ago

I never ended up getting around to this -- the initial attempts were unsuccessful as they required some breaking changes. Passing up the gauntlet if someone else wants to implement this. Otherwise, may want to close as wont-do.