Tracking Issue for BTreeMap cursors

Amanieu commented 1 year ago

Feature gate: #![feature(btree_cursors)]

ACP: https://github.com/rust-lang/libs-team/issues/141

This is a tracking issue for the Cursor and CursorMut types for BTreeMap.

A Cursor is like an iterator, except that it can freely seek back-and-forth, and can safely mutate the tree during iteration. This is because the lifetime of its yielded references is tied to its own lifetime, instead of just the underlying tree. A cursor either points to an element in the tree, or to a "ghost" non-element that is logically located after the last element and before the first element.

Public API

impl<K, V> BTreeMap<K, V> {
    fn lower_bound<Q>(&self, bound: Bound<&Q>) -> Cursor<'_, K, V>
    where
        K: Borrow<Q> + Ord,
        Q: Ord;
    fn lower_bound_mut<Q>(&mut self, bound: Bound<&Q>) -> CursorMut<'_, K, V>
    where
        K: Borrow<Q> + Ord,
        Q: Ord;
    fn upper_bound<Q>(&self, bound: Bound<&Q>) -> Cursor<'_, K, V>
    where
        K: Borrow<Q> + Ord,
        Q: Ord;
    fn upper_bound_mut<Q>(&mut self, bound: Bound<&Q>) -> CursorMut<'_, K, V>
    where
        K: Borrow<Q> + Ord,
        Q: Ord;
}

struct Cursor<'a, K: 'a, V: 'a>;

impl<'a, K, V> Cursor<'a, K, V> {
    fn move_next(&mut self);
    fn move_prev(&mut self);

    fn key(&self) -> Option<&'a K>;
    fn value(&self) -> Option<&'a V>;
    fn key_value(&self) -> Option<(&'a K, &'a V)>;

    fn peek_next(&self) -> Option<(&'a K, &'a V)>;
    fn peek_prev(&self) -> Option<(&'a K, &'a V)>;
}

struct CursorMut<'a, K: 'a, V: 'a>;

impl<'a, K, V> CursorMut<'a, K, V> {
    fn move_next(&mut self);
    fn move_prev(&mut self);

    fn key(&self) -> Option<&K>;
    fn value(&self) -> Option<&V>;
    fn value_mut(&mut self) -> Option<&mut V>;
    fn key_value(&self) -> Option<(&K, &V)>;
    fn key_value_mut(&self) -> Option<(&K, &mut V)>;

    unsafe fn key_mut_unchecked(&mut self) -> Option<&mut K>;

    fn peek_next(&self) -> Option<(&K, &V)>;
    fn peek_prev(&self) -> Option<(&K, &V)>;

    fn as_cursor(&self) -> Cursor<'_, K, V>;

    fn insert_after(&mut self, key: K, value: V);
    fn insert_before(&mut self, key: K, value: V);

    unsafe fn insert_after_unchecked(&mut self, key: K, value: V);
    unsafe fn insert_before_unchecked(&mut self, key: K, value: V);

    fn remove_current(&mut self) -> Option<(K, V)>;
    fn remove_current_and_move_back(&mut self) -> Option<(K, V)>;
}

Steps / History

[x] Implementation: #105641
[ ] Final comment period (FCP)^1
[ ] Stabilization PR

Unresolved Questions

None yet.

Bwallker commented 1 year ago

The key method on the Cursor and CursorMut types should be unsafe. If the key is interiorly mutable you could mutate it through the shared reference and break the invariants of BtreeMap. For example if the key is of type Cell\<u32> I could call key.set and change the value from 1 to 1 billion and break potentially both the "Key must be unique" and "Key must be correctly ordered" invariants of BTreeMap.

parasyte commented 1 year ago

As I understand it, the BTreeMap invariants are not safety invariants. If the key is changed from underneath the map, it can lead to bugs. But crucially not to memory safety violations.

The standard library docs make this distinction clear:

It is a logic error for a key to be modified in such a way that the key’s ordering relative to any other key, as determined by the Ord trait, changes while it is in the map. This is normally only possible through Cell, RefCell, global state, I/O, or unsafe code. The behavior resulting from such a logic error is not specified, but will be encapsulated to the BTreeMap that observed the logic error and not result in undefined behavior. This could include panics, incorrect results, aborts, memory leaks, and non-termination.

Bwallker commented 1 year ago

But the key_mut_unchecked method on CursorMut is unsafe because you might break the invariants of BTreeMap

parasyte commented 1 year ago

Unless there is a memory safety invariant with &mut K, I don't think it needs to be unsafe. A counterexample is BTreeMap::get_key_value(), which is safe to call, even if the key has interior mutability.

Amanieu commented 1 year ago

The invariant here is that if a function receives a BTreeMap<K, V> where K is a type with a "trusted" Ord implementation (such as usize) then it is guaranteed that the items in the BTreeMap will be in correctly sorted order.

Bwallker commented 1 year ago

You should clarify this in the docs. It's not really clear why key_mut_unchecked is unsafe reading the docs IMO

parasyte commented 1 year ago

FWIW, I agree that the docs are unclear. Copied directly from https://github.com/rust-lang/libs-team/issues/141:

    /// Returns a mutable reference to the of the element that the cursor is
    /// currently pointing to.
    ///
    /// This can be used to modify the key, but you must ensure that the
    /// `BTreeMap` invariants are maintained. Specifically:
    ///
    /// * The key must remain unique within the tree.
    /// * The key must remain in sorted order with regards to other elements in
    ///   the tree.
    unsafe fn key_mut_unchecked(&mut self) -> Option<&mut K>;

This does not explain what exactly is "unchecked". It appears to just repeat the invariants listed in the BTreeMap docs, which claim there is no possibility of UB by breaking them.

clarfonthey commented 1 year ago

So, I've been trying to create my own "range map" using this API and it feels… extremely clunky to me. Since I'm mostly working with the mutable API, I'll be talking about CursorMut specifically, but I suppose that some of these can apply to Cursor as well:

The handling of "ghost elements" is extremely clunky to me. From what I understand, these only can show up at the beginning and end of the map, since all entries into the API are via lower_bound and upper_bound which by their nature will "snap" to an existing, non-element. Personally, what would make the most sense to me is to require the cursor to always point to a valid element and return an entry-like enum for the lower_bound and upper_bound methods. This way, you can still distinguish between whether you're at the edges of the map on the initial creation of the cursor while not having to handle this case for all other uses of the cursor.

In that line of reasoning, I think that move_next and move_prev would be best if they simply returned a boolean (or other sentinel value) stating whether the move was successful, simply staying at the last or first element in the map rather than potentially moving into a ghost state.

While I think that the guarantees for the key_mut_unchecked method are poorly documented as others have mentioned, I appreciate the addition of it and personally use it myself. The only issue is having to handle the Option every time like mentioned.

Amanieu commented 1 year ago

There are several advantages in having a ghost element:

Insertion at a certain position "just works" after a lower_bound/upper_bound lookup, even if the tree is empty.
After removing the last element, the cursor must still point to something.
move_next/move_prev returning a bool is somewhat redundant given that .get() already returns an option.

With that said, I can see how it can be hard to work with. I'd be happy to try out a better API if you want to sketch one out.

jeffparsons commented 1 year ago

I assume this is a non-starter for performance reasons, but the most "natural" feeling interface for me would be one that can point to all real elements and all points between/next to them (including start/end). There would then be convenience methods for skipping over whatever you're not interested in so you can stay in "element space" or "gap space" if you only care about one or the other most of the time.

But again, I imagine this would make the API too slow because the compiler couldn't optimise away visiting all the gaps — does that sound right?

(Edit: if this doesn't make sense, I'd be happy to sketch what I imagine would be the core API, and what would be the optional convenience layers.)

(Edit2: or you could have separate cursor types for when you're pointing at an element vs a gap, and have the navigation functions consume self and convert between them where relevant. I think that would be on par for performance with the current design?)

clarfonthey commented 1 year ago

Honestly, the kind of modification I was thinking of was to leverage the existing Entry API for mutable cursors, and to simply take ownership of the cursor during certain operations that might cause the entry to become vacant, returning an entry instead of just modifying the occupied entry.

lf- commented 1 year ago

I have been implementing a TCP segment reordering buffer (where you're storing ranges and they wrap around u32::MAX).

So I have been looking into building a wrapping iterator over a BTreeMap and it seems that implementing a wrapping mutating cursor based on this is surprisingly nontrivial: you'd have to ... somehow ... swap between a &mut BTreeMap (to create a new CursorMut when you wrap) and a CursorMut and when you get rid of the CursorMut to replace it, put the &mut BTreeMap back. This is kind of nonobvious how to do when hand rolling a generator as one does while writing these things.

I guess my feedback on this API is that it might need a function to move it back to the start or to a specified element.

finnbear commented 1 year ago

Can we document the potential hazard that BTreeMap::upper_bound does the opposite of C++'s map::upper_bound?

In other words, when porting C++ to Rust,

map::lower_bound -> BTreeMap::lower_bound(Bound::Included(...))
map::upper_bound -> BTreeMap::lower_bound(Bound::Excluded(...))

FWIW, I'm strongly in favor of both API's :rocket:

Amanieu commented 1 year ago

Can we document the potential hazard that BTreeMap::upper_bound does the opposite of C++'s map::upper_bound?

Are you sure about this? They should map the same way.

upper_bound returns the last element below the given bound, which should match what C++'s upper_bound does.

finnbear commented 1 year ago

Are you sure about this? They should map the same way.

I'm more confused than I am sure, but I still get the impression that there is a hazard when porting C++ to Rust.

C++ upper_bound docs say that it "Returns an iterator pointing to the first element in the container whose key is considered to go after k."

Rust BTreeMap::upper_bound docs say that it "Returns a Cursor pointing at the last element that is below the given bound."

So, here are the results on an example map:

k1 -> v1
k2 -> v2
k3 -> v3
k4 -> v4
k5 -> v5

// C++
.upper_bound("k3") = "v4" // notably the same as .lower_bound(Bound::Excluded("k3")) in Rust
.lower_bound("k3") = "v3"

// Rust
.upper_bound(Bound::Excluded("k3")) = "v2"
.upper_bound(Bound::Included("k3")) = "v3"
.lower_bound(Bound::Excluded("k3")) = "v4"
.lower_bound(Bound::Included("k3")) = "v3"

My code

Online C++

#include 
#include 

int main ()
{
  std::map mymap;
  std::map::iterator itlow,itup;

  mymap[1]=1;
  mymap[2]=2;
  mymap[3]=3;
  mymap[4]=4;
  mymap[5]=5;

  itup=mymap.upper_bound (3);
  std::cout << itup->first << ' ' << itup->second << '\n';

  itlow=mymap.lower_bound (3);
  std::cout << itlow->first << ' ' << itlow->second << '\n';

  return 0;
}

Rust Playground

#![feature(btree_cursors)]

fn main() {
    use std::collections::BTreeMap;
    use std::ops::Bound;

    let mut a = BTreeMap::new();
    a.insert("k1", "v1");
    a.insert("k2", "v2");
    a.insert("k3", "v3");
    a.insert("k4", "v4");
    a.insert("k5", "v5");
    let cursor = a.upper_bound(Bound::Excluded(&"k3"));
    println!("upper exc {:?}", cursor.key());
     let cursor = a.upper_bound(Bound::Included(&"k3"));
    println!("upper inc {:?}", cursor.key());
    let cursor = a.lower_bound(Bound::Excluded(&"k3"));
    println!("lower exc {:?}", cursor.key());
     let cursor = a.lower_bound(Bound::Included(&"k3"));
    println!("lower inc {:?}", cursor.key());
}

@Amanieu thoughts?

Amanieu commented 1 year ago

I see, you are indeed correct.

Considering the feedback received so far, I am wondering if changing Cursors to represent a point between 2 elements rather than pointing at one element would be better:

This avoids the need for a ghost element, since even in an empty tree there exists a point which has no previous or next element.
lower_bound/upper_bound naturally end up pointing between 2 elements, which is less confusing.
Moving over a value with move_prev/move_next can return an Option<(&K, &mut V)> which results in a more natural Rust API.

Let me know if this sounds promising, and I'll try and sketch out a new API with this design.

finnbear commented 1 year ago

Let me know if this sounds promising

It's hard to say which is better without some short before & after code examples i.e. to do X, you do Y with the old API and Z with the new API (if you wanted, the examples could also show how to accomplish the same thing in C++ or other languages). For now, it sounds plausible but not necessarily better or worse.

Anyway, after realizing the confusion, the API's were both intuitive and useful for the porting I was doing today and the only thing I'm asking for is for the docs to point out the porting hazard (alternatively, the function names could be changed to avoid matching those used in C++).

Amanieu commented 1 year ago

Here's an API sketch for a cursor that resides at a point between 2 elements. Elements are read by "passing over" them with a cursor.

impl<K, V> BTreeMap<K, V> {
    fn lower_bound<Q>(&self, bound: Bound<&Q>) -> Cursor<'_, K, V>
    where
        K: Borrow<Q> + Ord,
        Q: Ord;
    fn lower_bound_mut<Q>(&mut self, bound: Bound<&Q>) -> CursorMut<'_, K, V>
    where
        K: Borrow<Q> + Ord,
        Q: Ord;
    fn upper_bound<Q>(&self, bound: Bound<&Q>) -> Cursor<'_, K, V>
    where
        K: Borrow<Q> + Ord,
        Q: Ord;
    fn upper_bound_mut<Q>(&mut self, bound: Bound<&Q>) -> CursorMut<'_, K, V>
    where
        K: Borrow<Q> + Ord,
        Q: Ord;
}

struct Cursor<'a, K: 'a, V: 'a>;

impl<'a, K, V> Cursor<'a, K, V> {
    fn next(&mut self) -> Option<(&'a K, &'a V)>;
    fn prev(&mut self) -> Option<(&'a K, &'a V)>;

    fn peek_next(&self) -> Option<(&'a K, &'a V)>;
    fn peek_prev(&self) -> Option<(&'a K, &'a V)>;
}

struct CursorMut<'a, K: 'a, V: 'a>;

impl<'a, K, V> CursorMut<'a, K, V> {
    fn next(&mut self) -> Option<(&K, &mut V)>;
    fn prev(&mut self) -> Option<(&K, &mut V)>;

    unsafe fn next_with_mut_key(&mut self) -> Option<(&mut K, &mut V)>;
    unsafe fn prev_with_mut_key(&mut self) -> Option<(&mut K, &mut V)>;

    fn peek_next(&self) -> Option<(&K, &V)>;
    fn peek_prev(&self) -> Option<(&K, &V)>;

    fn peek_next_mut(&mut self) -> Option<(&K, &mut V)>;
    fn peek_prev_mut(&mut self) -> Option<(&K, &mut V)>;

    unsafe fn peek_next_with_mut_key(&mut self) -> Option<(&mut K, &mut V)>;
    unsafe fn peek_prev_with_mut_key(&mut self) -> Option<(&mut K, &mut V)>;

    fn as_cursor(&self) -> Cursor<'_, K, V>;

    // Inserts at current point, cursor is moved to before the inserted element.
    fn insert_after(&mut self, key: K, value: V);
    // Inserts at current point, cursor is moved to after the inserted element.
    fn insert_before(&mut self, key: K, value: V);

    unsafe fn insert_after_unchecked(&mut self, key: K, value: V);
    unsafe fn insert_before_unchecked(&mut self, key: K, value: V);

    fn remove_next(&mut self) -> Option<(K, V)>;
    fn remove_prev(&mut self) -> Option<(K, V)>;
}

clarfonthey commented 1 year ago

Conceptually, I think that I do like this API more, but I still think that the combinatorical explosion of options here could be more easily avoided by integrating properly with the Entry API. Namely, if instead of having the options for the various combinations of whether you're inserting, mutating the key, etc. you could just leverage the existing Entry API with the ability to do key mutation.

Essentially, the position of the cursor between the elements is still unchanged. But rather than making the cursor this all-powerful entity, we give it equal footing with other various elements.

To accomplish this, I propose a particular sleight of hand. We retain the meaning of a "gap" as the position in the map between elements. A vacant entry points to a particular gap, and an occupied entry points to a particular element and its adjacent gaps. When mutating a key for either kind of entry, you must uphold that the entry upholds the Ord invariant for the map. So, you can nudge it around within its entry fine, but you can't leave it.

By leaving the semantics of the key specifically in the entry and requiring the entry for mutation of elements, the semantics of all the various mutable methods becomes more clear. Here's what the conceptual API would look like:

type Position = /* we'll talk about this later */;

impl<'a, K, V> Cursor<'a, K, V> { // and `CursorMut`
    // looks around the cursor
    fn peek_next(&self) -> Option<(&K, &V)>;
    fn peek_prev(&self) -> Option<(&K, &V)>;

    // move the cursor
    fn next(&mut self) -> Position;
    fn prev(&mut self) -> Position;
}

impl<'a, K, V> CursorMut<'a, K, V> {
    // attempts to insert at this spot in the cursor
    fn insert(self, key: K) -> Result<VacantEntry<'a, K, V>, CursorMut<'a, K, V>>;
    unsafe fn insert_unchecked(self, key: K) -> VacantEntry<'a, K, V>;

    // mutates around the cursor
    fn enter_next(&mut self) -> Option<OccupiedEntry<'_, K, V>>;
    fn enter_prev(&mut self) -> Option<OccupiedEntry<'_, K, V>>;
}

impl<'a, K, V> OccupiedEntry<'a, K, V> { // and `VacantEntry` and `Entry` too
    // requires upholding invariants mentioned
    unsafe fn key_mut(&mut self) -> &mut K;
}

impl<'a, K, V> VacantEntry<'a, K, V> {
    // if we want the ability to convert back into a cursor
    fn seek_elsewhere(self) -> (K, CursorMut<'a, K, V>);
    fn insert_and_seek_after(self, value: V) -> CursorMut<'a, K, V>;
    fn insert_and_seek_before(self, value: V) -> CursorMut<'a, K, V>;
}

Part of the reason why I think VacantEntry should have mutable access to the key is because it aligns well with the idea of entries being positions in the map associated with owned keys, which may or may not be inserted.

Note the bit I said about Position. I don't think that it's right to return a mutable value when mutating the cursor, mostly because we'd want all mutation to go through entries instead. I personally think it makes the most sense to return Option<&K> just because inspecting the key is really what you care most about, although since we're going to be providing the key anyway, we might as well also provide the value.

That's the bit I'm fuzziest on. But the rest is my basic idea on how to use entries to solve the complexity and the semantics. We don't need any weird insert_after or insert_before because we require converting a cursor into an entry, not simply borrowing it. I list some potential methods we could add to VacantEntry to allow converting back, but those weren't super thought out. My thought process is that the last step someone will probably want to take for the entry API is inserting, since you need to poke around to ensure you're satisfying invariants before you actually do the insert anyway.

clarfonthey commented 1 year ago

A side note on the naming of upper_bound and lower_bound conflicting with C++: why not just abandon that scheme altogether, then? We could go with something like seek_before and seek_after which would be (IMHO) unambiguously clear and is different enough from the status quo to avoid passive mistakes.

clarfonthey commented 1 year ago

While I'm not familiar enough with the cursor code to refactor everything to only point between elements, I did submit #112896 which adds the key_mut API to the entry types. One point I bring up in the PR which I think is worth mentioning here is that this kind of feature might deserve separation from this actual cursor API into its own tracking issue, since IMHO the cursor API and key mutation are concerns that should be separated. Would love to know what the folks following this thread think about this, and also would appreciate any additional feedback on the API I've proposed which integrates more with the entry API.

SvizelPritula commented 1 year ago

I'd like to also comment on the method names, if I can.

I've participated in several programming competitions in C++, where binary searches and searches on binary trees are common. Every time I've had to perform one, I had to stare at the documentation for several minutes, struggling to figure out what lower_bound and upper_bound does. (Granted, this is partially a fault of the docs.) I always wished C++ had gone with four seperate methods: find_last_less, find_first_greater_or_equal, find_last_less_or_equal and find_first_greater.

Rust's current upper_bound and lower_bound methods are significantly easier to understand and remember, but I also feel it would be easier to understand if they were called seek_last (or maybe find_last?) and seek_first.

professor-flux commented 1 year ago

I believe for most use cases of lower_bound and upper_bound, the range method is sufficient. It's essential that the method names clearly indicate the additional feature of having cursors, which range method does not provide. One possible suggestion for naming the new methods could be cursor_at and cursor_at_mut, which would also align well with the existing LinkedList API.

I think methods like insert_after and insert_before should not panic and should be replaced with something like insert_with_hint, that would also match with C++'s API for map and tree. cursors feature should also be extend to BTreeSet as well.

brurucy commented 1 year ago

Will there be a Cursor trait a-la Iterator?

clarfonthey commented 1 year ago

Will there be a Cursor trait a-la Iterator?

Considering how much cursors vary between collections, that seems incredibly unlikely. Once you get to the point of wanting a cursor, you are probably deep enough in the specifics of the data structure you're using that it would be effectively useless to generalise that code.

I can see some desire to standardise cursors between slices, Vec, and VecDeque, perhaps, but that would be another specific cursor type that just happens to be shared between three uses.

brurucy commented 1 year ago

Considering how much cursors vary between collections

In spite of cursors varying across different collections, their core essence remains the same i.e merely but a not-consuming iterator with a ghost element.

clarfonthey commented 1 year ago

In spite of cursors varying across different collections, their core essence remains the same i.e merely but a not-consuming iterator with a ghost element.

I think this isn't a good argument because it conflates similarity with generality.

Rust is rather conservative between what should be general out of all the similar things. For example, the Clone trait is useful to generalise because cloning is a common operation done on many things, and without a general trait, it would be much more difficult to automatically derive cloning for new types.

However, many types also offer MIN and MAX associated constants. These are similar but not generalised because it's unclear what APIs could benefit from bounding themselves based on a trait that offers these constants. Consider also that they could be associated methods, which might be better for types which consume a large amount of space-- consider a 4096-bit integer type used for RSA keys, which could automatically add 8 KiB to a binary that uses both. And what if you wanted to separate the MIN and the MAX -- should we have one trait, or two?

These types of questions are why Rust often tries to keep APIs that have the same function similar (they're all associated constants called MIN and MAX) so that downstream crates can create their own traits and implement them for libstd types using macros. However, because the actual design of such general functionality is very nuanced and not nearly as useful as in other cases (like Clone), there aren't any traits offered.

Something like cursors, I imagine, should be the same. I think that changing the design of these cursors based upon how a hypothetical downstream crate might generalise them is worth it, but I wouldn't say that libstd should offer its own trait without substantial evidence that it would be useful. And like I said, the fact that consumers of cursors are often already using features specific to their own data structures is probably a sign that these are simply similar constructs, and we shouldn't offer a general version.

zacknewman commented 1 year ago

I believe the documentation for BTreeMap::lower_bound and BTreeMap:lower_bound_mut should state that the cursor that is returned is pointing to the first/least/smallest element that is greater-than-or-equal to the given bound. Similarly, the upper_bound and upper_bound_mut docs should state that the cursor that is returned is pointing to the last/greatest/largest element that is less-than-or-equal to the given bound.

Mathematically, lower_bound returns the supremum and upper_bound returns the infimum. This does not contradict the variants of Bound either. When Bound::Included(t) is passed to lower_bound or upper_bound, the description is obvious (i.e., we are calculating min BTreeMap ∩ [t, ∞) and max BTreeMap ∩ (-∞, t] respectively). When Bound::Excluded(t) is passed to both, the "equal" portion is not contradicted since t is excluded from the set of values that can be returned (i.e., we are calculating min BTreeMap ∩ (t, ∞) and max BTreeMap ∩ (-∞, t) respectively). When Bound::Unbounded is passed, then that fits the behavior I described parenthetically with t taking the "values" of -∞ and ∞ respectively.

In case it's not clear, min BTreeMap ∩ [t, ∞) means "the minimum value from the set which is equal to the intersection of the set of values in BTreeMap and the set of all instances of T greater-than-or-equal to t when said intersection is not empty otherwise the 'ghost' non-element".

Further clarification

In the description I think "given bound" means the given instance of Bound (i.e., the parameter bound) not the value that is contained within the passed Bound variant if there even is one (i.e., the Bound variant is not Unbounded). This makes the documentation more obvious since the word "bound" aligns with both the parameter and type Bound as well as handles all three possible variants correctly. When "given bound" is meant to mean the value that is contained within the Bound variant, then that doesn't even make sense when Unbounded is passed since there is no such value. Additionally when Bound::Included(t) is passed to lower_bound or upper_bound, the documentation is plain wrong since when BTreeMap contains t, the cursor will be pointing at t which of course is equal to t (per the requirement that Eq is reflexive) and not strictly greater than or strictly less than t as the current documentation states.

John-Toohey commented 1 year ago

Regarding this API, will it also be supported on BTreeSet at some stage?

Amanieu commented 1 year ago

I have plans to refactor the BTreeMap cursor API, but probably won't have time in September. Extending it to cover BTreeSet is definitely planned.

momvart commented 1 year ago

I'm curious about the difference between CursorMut and Cursor regarding the lifetimes of reading APIs. For example, the references of key_value in Cursor have the lifetime of 'a but in CursorMut they have the lifetime of self.

zacknewman commented 1 year ago

I'm curious about the difference between CursorMut and Cursor regarding the lifetimes of reading APIs. For example, the references of key_value in Cursor have the lifetime of 'a but in CursorMut they have the lifetime of self.

CursorMut can't return a reference with lifetime 'a since that would violate Rust's safety guarantees. Cursor could return a reference with the same lifetime as &self, but that would be more restrictive since lifetime 'a will last at least as long as &self. Returning a reference with a lifetime that is at least as large as &self allows for code to be compilable in more situations especially due to subtyping of lifetimes which allows longer lifetimes to shorten when they need to.

Specifically, the following code won't compile since Rust would not be able to enforce the exclusive nature of an exclusive reference:

struct CursorMutSimple<'a, K: 'a, V: 'a> {
    map: &'a mut Vec<(K, V)>,
}
impl<'a, K: 'a, V: 'a> CursorMutSimple<'a, K, V> {
    // Below code won't compile when uncommented.
    //    fn first_value(&self) -> Option<&'a (K, V)> {
    //        self.map.get(0)
    //    }
}

If Cursor didn't return a reference with lifetime 'a, then code similar to below would no longer be possible:

struct CursorSimple<'a, K: 'a, V: 'a> {
    map: &'a Vec<(K, V)>,
}
impl<'a, K: 'a, V: 'a> CursorSimple<'a, K, V> {
    fn first_value(&self) -> Option<&(K, V)> {
        self.map.get(0)
    }
}
// Below code won't compile when uncommented since the lifetime
// is not tied to the data within.
//fn foo<'a, K: 'a, V: 'a>(val: CursorSimple<'a, K, V>) -> Option<&'a (K, V)> {
//    val.first_value()
//}

Alice Ryhl is quite brilliant, and I can't recommend enough reading some of her posts. Here is one of her many informative posts that is applicable here since she goes over what would happen if &'short &'long mut T were allowed to flatten to &'long T like &'short &'long T is allowed to.

momvart commented 1 year ago

Thanks for the explanation @zacknewman. I hadn't noted the difference between the lifetime of indirect mutable and immutable references.

Alice Ryhl is quite brilliant, and I can't recommend enough reading some of her posts. Here is one of her many informative posts that is applicable here

Such a great recommendation.

marcospb19 commented 1 year ago

EDIT: I'll hide this for now :thinking:, I doubt it's an useful comment.

I'm not saying this is a better idea, but I'll lay it down anyways.

If dealing with ghosts is too clunky, it's possible to guarantee that a Cursor will always point to a valid element, and return an Option at its construction.

impl<K, V> BTreeMap<K, V> {
    fn lower_bound<Q>(&self, bound: Bound<&Q>) -> Option<Cursor<'_, K, V>>
    where
        K: Borrow<Q> + Ord,
        Q: Ord;

    fn upper_bound<Q>(&self, bound: Bound<&Q>) -> Option<Cursor<'_, K, V>>
    where
        K: Borrow<Q> + Ord,
        Q: Ord;
}

struct Cursor<'a, K: 'a, V: 'a>;

impl<'a, K, V> Cursor<'a, K, V> {
    /// If `None`, cursor doesn't move.
    fn try_move_next(&mut self) -> Option<(&'a K, &'a V)>;
    /// If `None`, cursor doesn't move.
    fn try_move_prev(&mut self) -> Option<(&'a K, &'a V)>;

    fn key(&self) -> &'a K;
    fn value(&self) -> &'a V;
    fn key_value(&self) -> (&'a K, &'a V);

    fn peek_next(&self) -> Option<(&'a K, &'a V)>;
    fn peek_prev(&self) -> Option<(&'a K, &'a V)>;
}

This removes Options from a place just to add it at another.

marcospb19 commented 1 year ago

Now, speaking more seriously, about this:

Considering the feedback received so far, I am wondering if changing Cursors to represent a point between 2 elements rather than pointing at one element would be better:

...

Moving over a value with move_prev/move_next can return an Option<(&K, &mut V)> which results in a more natural Rust API.

Regardless of the cursor pointing at a single element, or at the "in-between" of two elements:

    fn move_next(&mut self);
    fn move_prev(&mut self);

These should definitely return Option<(&K, &V)> in both cases, so that we can shove it in a while let:

while let Some((key, value)) = cursor.move_next() {

move_next/move_prev returning a bool is somewhat redundant given that .get() already returns an option.

I don't like the bool idea, but I'd strongly disagree that this being redundant is a reason not to have it, given its benefits, unless there is a strong downside (possibly related to the implementation details, but I don't think there is any).

clarfonthey commented 1 year ago

I'm honestly on board with the idea of keeping the cursor in gaps since it solves the issue of inserting, which always happens in a gap. See my proposal here and #112896 which tracks adding a key_mut method to the Entry API.

marcospb19 commented 1 year ago

EDIT: this was hidden by accident.

@clarfonthey

I'm honestly on board with the idea of keeping the cursor in gaps since it solves the issue of inserting, which always happens in a gap. See my proposal https://github.com/rust-lang/rust/issues/107540#issuecomment-1590341835 and https://github.com/rust-lang/rust/pull/112896 which tracks adding a key_mut method to the Entry API.

About the insertion part, the gaps solution offers a tradeoff:

You go from try_insert_before + try_insert_after to simply try_insert.
But you also go from remove to remove_before + remove_after.

But what happens after .remove() is called? Does the cursor points to the next element or the previous?

On the other hand, what happens to a gap cursor after .insert() is called? Left or right? Well, neither! xD.

About your proposal:

fn insert(self, key: K) -> Result<VacantEntry<'a, K, V>, CursorMut<'a, K, V>>;

To ME (personal opinion ahead), inserting in a tree with a cursor will always be confusing because the cursor position might have nothing to do with it, it's an optimization "tip" to make it faster, but it might be a false tip.

I'd like cursors to go in, but if we don't agree on insertion questions, I don't think it should block it.

Insertion probably deserves its own proposal/tracking issue. If you really want to optimize it, I'd try something outside of the cursors:

impl BTreeMap {
    // For each element in the tree, reuse the last elements position as a "tip" for the location
    // of the `.next()` element, both for ascending and descending sequences
    //
    // If the tip is false, fallback to usual insert, and try again for the next element.
    fn insert_multiple(&mut self, impl IntoIterator<...>);

If your concern is not optimizing it, but getting a cursor out of an .insert(), we could get VaccantEntry::cursor[_mut] instead.

About the gap approach, (after two whole hours of deep thought) I'm convinced it's great.

But with it, the <= semantics of C++'s lower_bound are broken, to avoid confusion, names should definitely change.

My suggestions are cursor_before and cursor_after to get a cursor before or after the given element (there would be no == case anymore). :+1: Pretty straight forward.

clarfonthey commented 1 year ago

I get that insertion and removal are similar operations, but it's worth considering the fact that this API should cover cases not already covered. The entry API covers the parts of the map that do have elements, and in the formulation I described, the cursor covers everything else. This is also why I think that, instead of using custom insertion/removal routines, we should just provide entries instead, since that way we're not just duplicating functionality.

Consider the original case for this API to begin with: creating maps and sets whose keys are ranges, not discrete elements. If you want to insert a range, it's very unlikely that your range lands exactly on an element, so you want to search through the map efficiently and figure out what overlaps. In these cases, you're never "on" an element since the ranges will likely be sorted by their starting index, and instead you're going to have to move between gaps where the actual range lives.

And again, if you want to go with a system where you can move between occupied entries in the map, why not just augment the entry API? Offer the option to move between occupied entries directly on an entry. The reason why you can't do this for gaps is because while entries have exact keys, gaps only have keys you provide, which is why the insertion method I proposed actually involves adding a key to make a vacant entry. The reason for adding a falliable API is really for folks who prefer safe code and can panic if their key is wrong instead of invoking UB.

mina86 commented 1 year ago

qq, is there currently no non-nightly way to get the next/previous entry? I don’t care about any of the rest of Cursor features. Just need to find the smallest (greatest) key greater (smaller) than given one.

Amanieu commented 1 year ago

You can use btree.range((Bound::Excluded(&5), Bound::Unbounded)).next().

westonpace commented 1 year ago

You can use btree.range((Bound::Excluded(&5), Bound::Unbounded)).next().

This works well to retrieve the smallest value larger than the query (e.g. lower_bound).

Is there any workaround to retrieve the largest value smaller than the query (e.g. upper_bound)?

clarfonthey commented 1 year ago

Is there any workaround to retrieve the largest value smaller than the query (e.g. upper_bound)?

The iterator implements DoubleEndedIterator, so you can just replace next with next_back to get the upper part.

kesyog commented 12 months ago

Correct me if I'm wrong, but you'd also want to flip the bounds for upper_bound:

i.e.

lower_bound(5) ≈ btree.range(5..).next() upper_bound(5) ≈ btree.range(..=5).next_back()

(for inclusive bounds)

terrorfisch commented 11 months ago

Is there a reason why Q is not ?Sized in lower_bound and friends?

Amanieu commented 11 months ago

I refactored the cursor API to point to gaps instead of elements in #118208.

Amanieu commented 11 months ago

Is there a reason why Q is not ?Sized in lower_bound and friends?

This was an oversight, and is addressed in #118208.

RustyYato commented 10 months ago

Will there be a method to "reborrow" the mutable cursor. That way it's possible can peek ahead with the reborrow, or to pass the cursor to some other function by value (like converting it to an Entry).

impl<'a, K, V> CursorMut<'a, K, V> {
    fn reborrow(&mut self) -> CursorMut<'_, K, V> { ... }
}

elichai commented 10 months ago

I stumbled upon this API for the following use case: I have a network interface that does version exchanging and resolves to the lowest common version. The user can register "minimum version handlers" meaning that if the user registers a handler for versions 1,4,7 and we resolve to version 3 it will take the handler for version 1. If version 7+ comes in, it will take the handler for version 7. That way the user can register new handlers only when there's backward incompatibility.

This looks something like this:

type Handler<RET> = Box<dyn FnOnce(Driver) -> Box<dyn Future<Output = Result<RET, Error>>>>;
type Handlers<RET> = BTreeMap<Version, Handler<RET>>;

let common_version = ...;
let handler = handlers.upper_bound_mut(Bound::Included(&version)).remove_current();

EDIT: Right now it seems like the alternative requires Clone for the key:

let lowest_key = handlers.range(..=common_version).next()?.0.clone();
let handle = handlers.remove(&lowest_key)?;

Amanieu commented 10 months ago

Will there be a method to "reborrow" the mutable cursor. That way it's possible can peek ahead with the reborrow, or to pass the cursor to some other function by value (like converting it to an Entry).

That doesn't work: the reborrowed cursor might mutate the tree in a way that makes the current position of the original cursor invalid (for example if the node that it is pointing to is deleted).

jeffparsons commented 10 months ago

Will there be a method to "reborrow" the mutable cursor. That way it's possible can peek ahead with the reborrow, or to pass the cursor to some other function by value (like converting it to an Entry).

That doesn't work: the reborrowed cursor might mutate the tree in a way that makes the current position of the original cursor invalid (for example if the node that it is pointing to is deleted).

One possible variation, not sure if still useful:

Spin off a special non-mutating "seeking" cursor that borrows the normal CursorMut, and has a method to relocate that parent cursor to its current location.

rust-lang / rust