Open Amanieu opened 1 year ago
The key method on the Cursor and CursorMut types should be unsafe. If the key is interiorly mutable you could mutate it through the shared reference and break the invariants of BtreeMap. For example if the key is of type Cell\<u32> I could call key.set and change the value from 1 to 1 billion and break potentially both the "Key must be unique" and "Key must be correctly ordered" invariants of BTreeMap.
As I understand it, the BTreeMap invariants are not safety invariants. If the key is changed from underneath the map, it can lead to bugs. But crucially not to memory safety violations.
The standard library docs make this distinction clear:
It is a logic error for a key to be modified in such a way that the key’s ordering relative to any other key, as determined by the Ord trait, changes while it is in the map. This is normally only possible through Cell, RefCell, global state, I/O, or unsafe code. The behavior resulting from such a logic error is not specified, but will be encapsulated to the BTreeMap that observed the logic error and not result in undefined behavior. This could include panics, incorrect results, aborts, memory leaks, and non-termination.
But the key_mut_unchecked method on CursorMut is unsafe because you might break the invariants of BTreeMap
Unless there is a memory safety invariant with &mut K
, I don't think it needs to be unsafe. A counterexample is BTreeMap::get_key_value()
, which is safe to call, even if the key has interior mutability.
The invariant here is that if a function receives a BTreeMap<K, V>
where K
is a type with a "trusted" Ord
implementation (such as usize
) then it is guaranteed that the items in the BTreeMap
will be in correctly sorted order.
You should clarify this in the docs. It's not really clear why key_mut_unchecked is unsafe reading the docs IMO
FWIW, I agree that the docs are unclear. Copied directly from https://github.com/rust-lang/libs-team/issues/141:
/// Returns a mutable reference to the of the element that the cursor is
/// currently pointing to.
///
/// This can be used to modify the key, but you must ensure that the
/// `BTreeMap` invariants are maintained. Specifically:
///
/// * The key must remain unique within the tree.
/// * The key must remain in sorted order with regards to other elements in
/// the tree.
unsafe fn key_mut_unchecked(&mut self) -> Option<&mut K>;
This does not explain what exactly is "unchecked". It appears to just repeat the invariants listed in the BTreeMap docs, which claim there is no possibility of UB by breaking them.
So, I've been trying to create my own "range map" using this API and it feels… extremely clunky to me. Since I'm mostly working with the mutable API, I'll be talking about CursorMut
specifically, but I suppose that some of these can apply to Cursor
as well:
The handling of "ghost elements" is extremely clunky to me. From what I understand, these only can show up at the beginning and end of the map, since all entries into the API are via lower_bound
and upper_bound
which by their nature will "snap" to an existing, non-element. Personally, what would make the most sense to me is to require the cursor to always point to a valid element and return an entry-like enum for the lower_bound
and upper_bound
methods. This way, you can still distinguish between whether you're at the edges of the map on the initial creation of the cursor while not having to handle this case for all other uses of the cursor.
In that line of reasoning, I think that move_next
and move_prev
would be best if they simply returned a boolean (or other sentinel value) stating whether the move was successful, simply staying at the last or first element in the map rather than potentially moving into a ghost state.
While I think that the guarantees for the key_mut_unchecked
method are poorly documented as others have mentioned, I appreciate the addition of it and personally use it myself. The only issue is having to handle the Option
every time like mentioned.
There are several advantages in having a ghost element:
lower_bound
/upper_bound
lookup, even if the tree is empty.move_next
/move_prev
returning a bool is somewhat redundant given that .get()
already returns an option.With that said, I can see how it can be hard to work with. I'd be happy to try out a better API if you want to sketch one out.
I assume this is a non-starter for performance reasons, but the most "natural" feeling interface for me would be one that can point to all real elements and all points between/next to them (including start/end). There would then be convenience methods for skipping over whatever you're not interested in so you can stay in "element space" or "gap space" if you only care about one or the other most of the time.
But again, I imagine this would make the API too slow because the compiler couldn't optimise away visiting all the gaps — does that sound right?
(Edit: if this doesn't make sense, I'd be happy to sketch what I imagine would be the core API, and what would be the optional convenience layers.)
(Edit2: or you could have separate cursor types for when you're pointing at an element vs a gap, and have the navigation functions consume self and convert between them where relevant. I think that would be on par for performance with the current design?)
Honestly, the kind of modification I was thinking of was to leverage the existing Entry
API for mutable cursors, and to simply take ownership of the cursor during certain operations that might cause the entry to become vacant, returning an entry instead of just modifying the occupied entry.
I have been implementing a TCP segment reordering buffer (where you're storing ranges and they wrap around u32::MAX).
So I have been looking into building a wrapping iterator over a BTreeMap and it seems that implementing a wrapping mutating cursor based on this is surprisingly nontrivial: you'd have to ... somehow ... swap between a &mut BTreeMap (to create a new CursorMut when you wrap) and a CursorMut and when you get rid of the CursorMut to replace it, put the &mut BTreeMap back. This is kind of nonobvious how to do when hand rolling a generator as one does while writing these things.
I guess my feedback on this API is that it might need a function to move it back to the start or to a specified element.
Can we document the potential hazard that BTreeMap::upper_bound
does the opposite of C++'s map::upper_bound
?
In other words, when porting C++ to Rust,
map::lower_bound
-> BTreeMap::lower_bound(Bound::Included(...))
map::upper_bound
-> BTreeMap::lower_bound(Bound::Excluded(...))
FWIW, I'm strongly in favor of both API's :rocket:
Can we document the potential hazard that
BTreeMap::upper_bound
does the opposite of C++'smap::upper_bound
?
Are you sure about this? They should map the same way.
upper_bound
returns the last element below the given bound, which should match what C++'s upper_bound
does.
Are you sure about this? They should map the same way.
I'm more confused than I am sure, but I still get the impression that there is a hazard when porting C++ to Rust.
C++ upper_bound
docs say that it "Returns an iterator pointing to the first element in the container whose key is considered to go after k."
Rust BTreeMap::upper_bound
docs say that it "Returns a Cursor pointing at the last element that is below the given bound."
So, here are the results on an example map:
k1 -> v1
k2 -> v2
k3 -> v3
k4 -> v4
k5 -> v5
// C++
.upper_bound("k3") = "v4" // notably the same as .lower_bound(Bound::Excluded("k3")) in Rust
.lower_bound("k3") = "v3"
// Rust
.upper_bound(Bound::Excluded("k3")) = "v2"
.upper_bound(Bound::Included("k3")) = "v3"
.lower_bound(Bound::Excluded("k3")) = "v4"
.lower_bound(Bound::Included("k3")) = "v3"
#includeRust Playground#include
#![feature(btree_cursors)] fn main() { use std::collections::BTreeMap; use std::ops::Bound; let mut a = BTreeMap::new(); a.insert("k1", "v1"); a.insert("k2", "v2"); a.insert("k3", "v3"); a.insert("k4", "v4"); a.insert("k5", "v5"); let cursor = a.upper_bound(Bound::Excluded(&"k3")); println!("upper exc {:?}", cursor.key()); let cursor = a.upper_bound(Bound::Included(&"k3")); println!("upper inc {:?}", cursor.key()); let cursor = a.lower_bound(Bound::Excluded(&"k3")); println!("lower exc {:?}", cursor.key()); let cursor = a.lower_bound(Bound::Included(&"k3")); println!("lower inc {:?}", cursor.key()); }
@Amanieu thoughts?
I see, you are indeed correct.
Considering the feedback received so far, I am wondering if changing Cursors to represent a point between 2 elements rather than pointing at one element would be better:
lower_bound
/upper_bound
naturally end up pointing between 2 elements, which is less confusing.move_prev
/move_next
can return an Option<(&K, &mut V)>
which results in a more natural Rust API.Let me know if this sounds promising, and I'll try and sketch out a new API with this design.
Let me know if this sounds promising
It's hard to say which is better without some short before & after code examples i.e. to do X, you do Y
with the old API and Z
with the new API (if you wanted, the examples could also show how to accomplish the same thing in C++ or other languages). For now, it sounds plausible but not necessarily better or worse.
Anyway, after realizing the confusion, the API's were both intuitive and useful for the porting I was doing today and the only thing I'm asking for is for the docs to point out the porting hazard (alternatively, the function names could be changed to avoid matching those used in C++).
Here's an API sketch for a cursor that resides at a point between 2 elements. Elements are read by "passing over" them with a cursor.
impl<K, V> BTreeMap<K, V> {
fn lower_bound<Q>(&self, bound: Bound<&Q>) -> Cursor<'_, K, V>
where
K: Borrow<Q> + Ord,
Q: Ord;
fn lower_bound_mut<Q>(&mut self, bound: Bound<&Q>) -> CursorMut<'_, K, V>
where
K: Borrow<Q> + Ord,
Q: Ord;
fn upper_bound<Q>(&self, bound: Bound<&Q>) -> Cursor<'_, K, V>
where
K: Borrow<Q> + Ord,
Q: Ord;
fn upper_bound_mut<Q>(&mut self, bound: Bound<&Q>) -> CursorMut<'_, K, V>
where
K: Borrow<Q> + Ord,
Q: Ord;
}
struct Cursor<'a, K: 'a, V: 'a>;
impl<'a, K, V> Cursor<'a, K, V> {
fn next(&mut self) -> Option<(&'a K, &'a V)>;
fn prev(&mut self) -> Option<(&'a K, &'a V)>;
fn peek_next(&self) -> Option<(&'a K, &'a V)>;
fn peek_prev(&self) -> Option<(&'a K, &'a V)>;
}
struct CursorMut<'a, K: 'a, V: 'a>;
impl<'a, K, V> CursorMut<'a, K, V> {
fn next(&mut self) -> Option<(&K, &mut V)>;
fn prev(&mut self) -> Option<(&K, &mut V)>;
unsafe fn next_with_mut_key(&mut self) -> Option<(&mut K, &mut V)>;
unsafe fn prev_with_mut_key(&mut self) -> Option<(&mut K, &mut V)>;
fn peek_next(&self) -> Option<(&K, &V)>;
fn peek_prev(&self) -> Option<(&K, &V)>;
fn peek_next_mut(&mut self) -> Option<(&K, &mut V)>;
fn peek_prev_mut(&mut self) -> Option<(&K, &mut V)>;
unsafe fn peek_next_with_mut_key(&mut self) -> Option<(&mut K, &mut V)>;
unsafe fn peek_prev_with_mut_key(&mut self) -> Option<(&mut K, &mut V)>;
fn as_cursor(&self) -> Cursor<'_, K, V>;
// Inserts at current point, cursor is moved to before the inserted element.
fn insert_after(&mut self, key: K, value: V);
// Inserts at current point, cursor is moved to after the inserted element.
fn insert_before(&mut self, key: K, value: V);
unsafe fn insert_after_unchecked(&mut self, key: K, value: V);
unsafe fn insert_before_unchecked(&mut self, key: K, value: V);
fn remove_next(&mut self) -> Option<(K, V)>;
fn remove_prev(&mut self) -> Option<(K, V)>;
}
Conceptually, I think that I do like this API more, but I still think that the combinatorical explosion of options here could be more easily avoided by integrating properly with the Entry
API. Namely, if instead of having the options for the various combinations of whether you're inserting, mutating the key, etc. you could just leverage the existing Entry
API with the ability to do key mutation.
Essentially, the position of the cursor between the elements is still unchanged. But rather than making the cursor this all-powerful entity, we give it equal footing with other various elements.
To accomplish this, I propose a particular sleight of hand. We retain the meaning of a "gap" as the position in the map between elements. A vacant entry points to a particular gap, and an occupied entry points to a particular element and its adjacent gaps. When mutating a key for either kind of entry, you must uphold that the entry upholds the Ord
invariant for the map. So, you can nudge it around within its entry fine, but you can't leave it.
By leaving the semantics of the key specifically in the entry and requiring the entry for mutation of elements, the semantics of all the various mutable methods becomes more clear. Here's what the conceptual API would look like:
type Position = /* we'll talk about this later */;
impl<'a, K, V> Cursor<'a, K, V> { // and `CursorMut`
// looks around the cursor
fn peek_next(&self) -> Option<(&K, &V)>;
fn peek_prev(&self) -> Option<(&K, &V)>;
// move the cursor
fn next(&mut self) -> Position;
fn prev(&mut self) -> Position;
}
impl<'a, K, V> CursorMut<'a, K, V> {
// attempts to insert at this spot in the cursor
fn insert(self, key: K) -> Result<VacantEntry<'a, K, V>, CursorMut<'a, K, V>>;
unsafe fn insert_unchecked(self, key: K) -> VacantEntry<'a, K, V>;
// mutates around the cursor
fn enter_next(&mut self) -> Option<OccupiedEntry<'_, K, V>>;
fn enter_prev(&mut self) -> Option<OccupiedEntry<'_, K, V>>;
}
impl<'a, K, V> OccupiedEntry<'a, K, V> { // and `VacantEntry` and `Entry` too
// requires upholding invariants mentioned
unsafe fn key_mut(&mut self) -> &mut K;
}
impl<'a, K, V> VacantEntry<'a, K, V> {
// if we want the ability to convert back into a cursor
fn seek_elsewhere(self) -> (K, CursorMut<'a, K, V>);
fn insert_and_seek_after(self, value: V) -> CursorMut<'a, K, V>;
fn insert_and_seek_before(self, value: V) -> CursorMut<'a, K, V>;
}
Part of the reason why I think VacantEntry
should have mutable access to the key is because it aligns well with the idea of entries being positions in the map associated with owned keys, which may or may not be inserted.
Note the bit I said about Position
. I don't think that it's right to return a mutable value when mutating the cursor, mostly because we'd want all mutation to go through entries instead. I personally think it makes the most sense to return Option<&K>
just because inspecting the key is really what you care most about, although since we're going to be providing the key anyway, we might as well also provide the value.
That's the bit I'm fuzziest on. But the rest is my basic idea on how to use entries to solve the complexity and the semantics. We don't need any weird insert_after
or insert_before
because we require converting a cursor into an entry, not simply borrowing it. I list some potential methods we could add to VacantEntry
to allow converting back, but those weren't super thought out. My thought process is that the last step someone will probably want to take for the entry API is inserting, since you need to poke around to ensure you're satisfying invariants before you actually do the insert anyway.
A side note on the naming of upper_bound
and lower_bound
conflicting with C++: why not just abandon that scheme altogether, then? We could go with something like seek_before
and seek_after
which would be (IMHO) unambiguously clear and is different enough from the status quo to avoid passive mistakes.
While I'm not familiar enough with the cursor code to refactor everything to only point between elements, I did submit #112896 which adds the key_mut
API to the entry types. One point I bring up in the PR which I think is worth mentioning here is that this kind of feature might deserve separation from this actual cursor API into its own tracking issue, since IMHO the cursor API and key mutation are concerns that should be separated. Would love to know what the folks following this thread think about this, and also would appreciate any additional feedback on the API I've proposed which integrates more with the entry API.
I'd like to also comment on the method names, if I can.
I've participated in several programming competitions in C++, where binary searches and searches on binary trees are common. Every time I've had to perform one, I had to stare at the documentation for several minutes, struggling to figure out what lower_bound
and upper_bound
does. (Granted, this is partially a fault of the docs.) I always wished C++ had gone with four seperate methods: find_last_less
, find_first_greater_or_equal
, find_last_less_or_equal
and find_first_greater
.
Rust's current upper_bound
and lower_bound
methods are significantly easier to understand and remember, but I also feel it would be easier to understand if they were called seek_last
(or maybe find_last
?) and seek_first
.
I believe for most use cases of lower_bound
and upper_bound
, the range
method is sufficient. It's essential that the method names clearly indicate the additional feature of having cursors, which range
method does not provide. One possible suggestion for naming the new methods could be cursor_at
and cursor_at_mut
, which would also align well with the existing LinkedList
API.
I think methods like insert_after
and insert_before
should not panic and should be replaced with something like insert_with_hint
, that would also match with C++'s API for map
and tree
.
cursors
feature should also be extend to BTreeSet
as well.
Will there be a Cursor
trait a-la Iterator
?
Will there be a
Cursor
trait a-laIterator
?
Considering how much cursors vary between collections, that seems incredibly unlikely. Once you get to the point of wanting a cursor, you are probably deep enough in the specifics of the data structure you're using that it would be effectively useless to generalise that code.
I can see some desire to standardise cursors between slices, Vec
, and VecDeque
, perhaps, but that would be another specific cursor type that just happens to be shared between three uses.
Considering how much cursors vary between collections
In spite of cursors varying across different collections, their core essence remains the same i.e merely but a not-consuming iterator with a ghost element.
In spite of cursors varying across different collections, their core essence remains the same i.e merely but a not-consuming iterator with a ghost element.
I think this isn't a good argument because it conflates similarity with generality.
Rust is rather conservative between what should be general out of all the similar things. For example, the Clone
trait is useful to generalise because cloning is a common operation done on many things, and without a general trait, it would be much more difficult to automatically derive cloning for new types.
However, many types also offer MIN
and MAX
associated constants. These are similar but not generalised because it's unclear what APIs could benefit from bounding themselves based on a trait that offers these constants. Consider also that they could be associated methods, which might be better for types which consume a large amount of space-- consider a 4096-bit integer type used for RSA keys, which could automatically add 8 KiB to a binary that uses both. And what if you wanted to separate the MIN
and the MAX
-- should we have one trait, or two?
These types of questions are why Rust often tries to keep APIs that have the same function similar (they're all associated constants called MIN
and MAX
) so that downstream crates can create their own traits and implement them for libstd types using macros. However, because the actual design of such general functionality is very nuanced and not nearly as useful as in other cases (like Clone
), there aren't any traits offered.
Something like cursors, I imagine, should be the same. I think that changing the design of these cursors based upon how a hypothetical downstream crate might generalise them is worth it, but I wouldn't say that libstd should offer its own trait without substantial evidence that it would be useful. And like I said, the fact that consumers of cursors are often already using features specific to their own data structures is probably a sign that these are simply similar constructs, and we shouldn't offer a general version.
I believe the documentation for BTreeMap::lower_bound
and BTreeMap:lower_bound_mut
should state that the cursor that is returned is pointing to the first/least/smallest element that is greater-than-or-equal to the given bound. Similarly, the upper_bound
and upper_bound_mut
docs should state that the cursor that is returned is pointing to the last/greatest/largest element that is less-than-or-equal to the given bound.
Mathematically, lower_bound
returns the supremum and upper_bound
returns the infimum. This does not contradict the variants of Bound
either. When Bound::Included(t)
is passed to lower_bound
or upper_bound
, the description is obvious (i.e., we are calculating min BTreeMap
∩ [t
, ∞) and max BTreeMap
∩ (-∞, t
] respectively). When Bound::Excluded(t)
is passed to both, the "equal" portion is not contradicted since t
is excluded from the set of values that can be returned (i.e., we are calculating min BTreeMap
∩ (t
, ∞) and max BTreeMap
∩ (-∞, t
) respectively). When Bound::Unbounded
is passed, then that fits the behavior I described parenthetically with t
taking the "values" of -∞ and ∞ respectively.
In case it's not clear, min BTreeMap
∩ [t
, ∞) means "the minimum value from the set which is equal to the intersection of the set of values in BTreeMap
and the set of all instances of T
greater-than-or-equal to t
when said intersection is not empty otherwise the 'ghost' non-element".
In the description I think "given bound" means the given instance of Bound
(i.e., the parameter bound
) not the value that is contained within the passed Bound
variant if there even is one (i.e., the Bound
variant is not Unbounded
). This makes the documentation more obvious since the word "bound" aligns with both the parameter and type Bound
as well as handles all three possible variants correctly. When "given bound" is meant to mean the value that is contained within the Bound
variant, then that doesn't even make sense when Unbounded
is passed since there is no such value. Additionally when Bound::Included(t)
is passed to lower_bound
or upper_bound
, the documentation is plain wrong since when BTreeMap
contains t
, the cursor will be pointing at t
which of course is equal to t
(per the requirement that Eq
is reflexive) and not strictly greater than or strictly less than t
as the current documentation states.
Regarding this API, will it also be supported on BTreeSet
at some stage?
I have plans to refactor the BTreeMap
cursor API, but probably won't have time in September. Extending it to cover BTreeSet
is definitely planned.
I'm curious about the difference between CursorMut
and Cursor
regarding the lifetimes of reading APIs. For example, the references of key_value
in Cursor
have the lifetime of 'a
but in CursorMut
they have the lifetime of self
.
I'm curious about the difference between
CursorMut
andCursor
regarding the lifetimes of reading APIs. For example, the references ofkey_value
inCursor
have the lifetime of'a
but inCursorMut
they have the lifetime ofself
.
CursorMut
can't return a reference with lifetime 'a
since that would violate Rust's safety guarantees. Cursor
could return a reference with the same lifetime as &self
, but that would be more restrictive since lifetime 'a
will last at least as long as &self
. Returning a reference with a lifetime that is at least as large as &self
allows for code to be compilable in more situations especially due to subtyping of lifetimes which allows longer lifetimes to shorten when they need to.
Specifically, the following code won't compile since Rust would not be able to enforce the exclusive nature of an exclusive reference:
struct CursorMutSimple<'a, K: 'a, V: 'a> {
map: &'a mut Vec<(K, V)>,
}
impl<'a, K: 'a, V: 'a> CursorMutSimple<'a, K, V> {
// Below code won't compile when uncommented.
// fn first_value(&self) -> Option<&'a (K, V)> {
// self.map.get(0)
// }
}
If Cursor
didn't return a reference with lifetime 'a
, then code similar to below would no longer be possible:
struct CursorSimple<'a, K: 'a, V: 'a> {
map: &'a Vec<(K, V)>,
}
impl<'a, K: 'a, V: 'a> CursorSimple<'a, K, V> {
fn first_value(&self) -> Option<&(K, V)> {
self.map.get(0)
}
}
// Below code won't compile when uncommented since the lifetime
// is not tied to the data within.
//fn foo<'a, K: 'a, V: 'a>(val: CursorSimple<'a, K, V>) -> Option<&'a (K, V)> {
// val.first_value()
//}
Alice Ryhl is quite brilliant, and I can't recommend enough reading some of her posts. Here is one of her many informative posts that is applicable here since she goes over what would happen if &'short &'long mut T
were allowed to flatten to &'long T
like &'short &'long T
is allowed to.
Thanks for the explanation @zacknewman. I hadn't noted the difference between the lifetime of indirect mutable and immutable references.
Alice Ryhl is quite brilliant, and I can't recommend enough reading some of her posts. Here is one of her many informative posts that is applicable here
Such a great recommendation.
EDIT: I'll hide this for now :thinking:, I doubt it's an useful comment.
I'm not saying this is a better idea, but I'll lay it down anyways.
If dealing with ghosts is too clunky, it's possible to guarantee that a Cursor
will always point to a valid element, and return an Option
at its construction.
impl<K, V> BTreeMap<K, V> {
fn lower_bound<Q>(&self, bound: Bound<&Q>) -> Option<Cursor<'_, K, V>>
where
K: Borrow<Q> + Ord,
Q: Ord;
fn upper_bound<Q>(&self, bound: Bound<&Q>) -> Option<Cursor<'_, K, V>>
where
K: Borrow<Q> + Ord,
Q: Ord;
}
struct Cursor<'a, K: 'a, V: 'a>;
impl<'a, K, V> Cursor<'a, K, V> {
/// If `None`, cursor doesn't move.
fn try_move_next(&mut self) -> Option<(&'a K, &'a V)>;
/// If `None`, cursor doesn't move.
fn try_move_prev(&mut self) -> Option<(&'a K, &'a V)>;
fn key(&self) -> &'a K;
fn value(&self) -> &'a V;
fn key_value(&self) -> (&'a K, &'a V);
fn peek_next(&self) -> Option<(&'a K, &'a V)>;
fn peek_prev(&self) -> Option<(&'a K, &'a V)>;
}
This removes Option
s from a place just to add it at another.
Now, speaking more seriously, about this:
Considering the feedback received so far, I am wondering if changing Cursors to represent a point between 2 elements rather than pointing at one element would be better:
- ...
- Moving over a value with move_prev/move_next can return an Option<(&K, &mut V)> which results in a more natural Rust API.
Regardless of the cursor pointing at a single element, or at the "in-between" of two elements:
fn move_next(&mut self);
fn move_prev(&mut self);
These should definitely return Option<(&K, &V)>
in both cases, so that we can shove it in a while let
:
while let Some((key, value)) = cursor.move_next() {
- move_next/move_prev returning a bool is somewhat redundant given that .get() already returns an option.
I don't like the bool
idea, but I'd strongly disagree that this being redundant is a reason not to have it, given its benefits, unless there is a strong downside (possibly related to the implementation details, but I don't think there is any).
I'm honestly on board with the idea of keeping the cursor in gaps since it solves the issue of inserting, which always happens in a gap. See my proposal here and #112896 which tracks adding a key_mut
method to the Entry
API.
EDIT: this was hidden by accident.
@clarfonthey
I'm honestly on board with the idea of keeping the cursor in gaps since it solves the issue of inserting, which always happens in a gap. See my proposal https://github.com/rust-lang/rust/issues/107540#issuecomment-1590341835 and https://github.com/rust-lang/rust/pull/112896 which tracks adding a key_mut method to the Entry API.
About the insertion part, the gaps solution offers a tradeoff:
try_insert_before
+ try_insert_after
to simply try_insert
.remove
to remove_before
+ remove_after
.But what happens after .remove()
is called? Does the cursor points to the next element or the previous?
On the other hand, what happens to a gap cursor after .insert()
is called? Left or right? Well, neither! xD.
About your proposal:
fn insert(self, key: K) -> Result<VacantEntry<'a, K, V>, CursorMut<'a, K, V>>;
To ME (personal opinion ahead), inserting in a tree with a cursor will always be confusing because the cursor position might have nothing to do with it, it's an optimization "tip" to make it faster, but it might be a false tip.
I'd like cursors to go in, but if we don't agree on insertion questions, I don't think it should block it.
Insertion probably deserves its own proposal/tracking issue. If you really want to optimize it, I'd try something outside of the cursors:
impl BTreeMap {
// For each element in the tree, reuse the last elements position as a "tip" for the location
// of the `.next()` element, both for ascending and descending sequences
//
// If the tip is false, fallback to usual insert, and try again for the next element.
fn insert_multiple(&mut self, impl IntoIterator<...>);
If your concern is not optimizing it, but getting a cursor out of an .insert()
, we could get VaccantEntry::cursor[_mut]
instead.
About the gap approach, (after two whole hours of deep thought) I'm convinced it's great.
But with it, the <=
semantics of C++'s lower_bound
are broken, to avoid confusion, names should definitely change.
My suggestions are cursor_before
and cursor_after
to get a cursor before or after the given element (there would be no ==
case anymore). :+1: Pretty straight forward.
I get that insertion and removal are similar operations, but it's worth considering the fact that this API should cover cases not already covered. The entry API covers the parts of the map that do have elements, and in the formulation I described, the cursor covers everything else. This is also why I think that, instead of using custom insertion/removal routines, we should just provide entries instead, since that way we're not just duplicating functionality.
Consider the original case for this API to begin with: creating maps and sets whose keys are ranges, not discrete elements. If you want to insert a range, it's very unlikely that your range lands exactly on an element, so you want to search through the map efficiently and figure out what overlaps. In these cases, you're never "on" an element since the ranges will likely be sorted by their starting index, and instead you're going to have to move between gaps where the actual range lives.
And again, if you want to go with a system where you can move between occupied entries in the map, why not just augment the entry API? Offer the option to move between occupied entries directly on an entry. The reason why you can't do this for gaps is because while entries have exact keys, gaps only have keys you provide, which is why the insertion method I proposed actually involves adding a key to make a vacant entry. The reason for adding a falliable API is really for folks who prefer safe code and can panic if their key is wrong instead of invoking UB.
qq, is there currently no non-nightly way to get the next/previous entry? I don’t care about any of the rest of Cursor features. Just need to find the smallest (greatest) key greater (smaller) than given one.
You can use btree.range((Bound::Excluded(&5), Bound::Unbounded)).next()
.
You can use btree.range((Bound::Excluded(&5), Bound::Unbounded)).next().
This works well to retrieve the smallest value larger than the query (e.g. lower_bound
).
Is there any workaround to retrieve the largest value smaller than the query (e.g. upper_bound
)?
Is there any workaround to retrieve the largest value smaller than the query (e.g.
upper_bound
)?
The iterator implements DoubleEndedIterator
, so you can just replace next
with next_back
to get the upper part.
Correct me if I'm wrong, but you'd also want to flip the bounds for upper_bound
:
i.e.
lower_bound(5)
≈ btree.range(5..).next()
upper_bound(5)
≈ btree.range(..=5).next_back()
(for inclusive bounds)
Is there a reason why Q
is not ?Sized
in lower_bound
and friends?
I refactored the cursor API to point to gaps instead of elements in #118208.
Is there a reason why
Q
is not?Sized
inlower_bound
and friends?
This was an oversight, and is addressed in #118208.
Will there be a method to "reborrow" the mutable cursor. That way it's possible can peek ahead with the reborrow, or to pass the cursor to some other function by value (like converting it to an Entry
).
impl<'a, K, V> CursorMut<'a, K, V> {
fn reborrow(&mut self) -> CursorMut<'_, K, V> { ... }
}
I stumbled upon this API for the following use case:
I have a network interface that does version exchanging and resolves to the lowest common version.
The user can register "minimum version handlers" meaning that if the user registers a handler for versions 1,4,7
and we resolve to version 3
it will take the handler for version 1. If version 7+
comes in, it will take the handler for version 7
.
That way the user can register new handlers only when there's backward incompatibility.
This looks something like this:
type Handler<RET> = Box<dyn FnOnce(Driver) -> Box<dyn Future<Output = Result<RET, Error>>>>;
type Handlers<RET> = BTreeMap<Version, Handler<RET>>;
let common_version = ...;
let handler = handlers.upper_bound_mut(Bound::Included(&version)).remove_current();
EDIT:
Right now it seems like the alternative requires Clone
for the key:
let lowest_key = handlers.range(..=common_version).next()?.0.clone();
let handle = handlers.remove(&lowest_key)?;
Will there be a method to "reborrow" the mutable cursor. That way it's possible can peek ahead with the reborrow, or to pass the cursor to some other function by value (like converting it to an
Entry
).
That doesn't work: the reborrowed cursor might mutate the tree in a way that makes the current position of the original cursor invalid (for example if the node that it is pointing to is deleted).
Will there be a method to "reborrow" the mutable cursor. That way it's possible can peek ahead with the reborrow, or to pass the cursor to some other function by value (like converting it to an
Entry
).That doesn't work: the reborrowed cursor might mutate the tree in a way that makes the current position of the original cursor invalid (for example if the node that it is pointing to is deleted).
One possible variation, not sure if still useful:
Spin off a special non-mutating "seeking" cursor that borrows the normal CursorMut, and has a method to relocate that parent cursor to its current location.
Feature gate:
#![feature(btree_cursors)]
ACP: https://github.com/rust-lang/libs-team/issues/141
This is a tracking issue for the
Cursor
andCursorMut
types forBTreeMap
.A Cursor is like an iterator, except that it can freely seek back-and-forth, and can safely mutate the tree during iteration. This is because the lifetime of its yielded references is tied to its own lifetime, instead of just the underlying tree. A cursor either points to an element in the tree, or to a "ghost" non-element that is logically located after the last element and before the first element.
Public API
Steps / History
Unresolved Questions