rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.8k stars 1.55k forks source link

pre-RFC: Add a way to insert into a collection, then return a reference to the inserted item in one operation. #3343

Open cyqsimon opened 1 year ago

cyqsimon commented 1 year ago

I ran across this issue today when I'm attempting to write a function like this:

#[derive(Clone, Debug, PartialEq, Eq, Hash)]
struct Player {
    // fields omitted
}
struct PlayerList {
    // other fields omitted
    players: HashSet<Player>,
}
impl PlayerList {
    /// Create a new player and insert it into the player list.
    /// Then return a reference to this new player.
    pub fn new_player(
        &mut self,
        // other fields omitted
    ) -> &Player {
        let new = Player {};
        self.players.insert(new);
        todo!("return a reference to the new player")
    }
}

To my surprise, there is no easy way to do this with HashSet, or for that matter, with any collection. I ended up having to write this:

    pub fn new_player(&mut self) -> &Player {
        let new = Player {};
        self.players.insert(new.clone());
        self.players.get(&new).unwrap()
    }

This is obviously bad for a multitude of reasons:


So I asked on the Rust-lang discord guild here, and Yand.rs gave this solution:

use ::hashbrown::HashMap;

fn create_new(s: &mut HashMap<String, ()>) -> &str {
    let new = String::from("…");
    s.raw_entry_mut().from_key(&new).or_insert(new, ()).0
}

... but also had this to say:

Sadly it requires raw_entry_mut(), which is either unstable, or requires hashbrown. And also only manual maps have these, sets don't for some reason


My proposition

Add a family of associated functions that provide this capability to all collections:

impl<T> Vec<T> {
    pub fn insert_get(&mut self, index: usize, value: T) -> &T;
    pub fn push_get(&mut self, value: T) -> &T;
}
impl<T> VecDeque<T> {
    pub fn insert_get(&mut self, index: usize, value: T) -> &T;
    // might be unnecessary, see questions
    pub fn push_front_get(&mut self, value: T) -> &T;
    // might be unnecessary, see questions
    pub fn push_back_get(&mut self, value: T) -> &T;
}
impl<T> LinkedList<T> {
    // might be unnecessary, see questions
    pub fn push_front_get(&mut self, value: T) -> &T;
    // might be unnecessary, see questions
    pub fn push_back_get(&mut self, value: T) -> &T;
}
impl<K, V> HashMap<K, V> {
    pub fn insert_get(&mut self, key: K, value: V) -> (&V, Option<V>);
    // See https://github.com/rust-lang/rust/issues/82766#issuecomment-1301845589
    pub fn insert_vacant_get(&mut self, key: K, value: V) -> (&V, Option<V>);
}
impl<K, V> BTreeMap<K, V> {
    pub fn insert_get(&mut self, key: K, value: V) -> (&V, Option<V>);
    // See https://github.com/rust-lang/rust/issues/82766#issuecomment-1301845589
    pub fn insert_vacant_get(&mut self, key: K, value: V) -> (&V, Option<V>);
}
impl<T> HashSet<T> {
    pub fn replace_get(&mut self, value: T) -> (&T, Option<T>);
}
impl<T> BTreeSet<T> {
    pub fn replace_get(&mut self, value: T) -> (&T, Option<T>);
}
impl<T> BinaryHeap<T> {
    pub fn push_get(&mut self, value: T) -> &T;
}

Potential questions

programmerjake commented 1 year ago

for the types with mutable members (so not sets) imho the returned reference should be a mutable reference, the caller can easily convert to a shared reference if they desire but not the other way around.

SOF3 commented 1 year ago

What is here that cannot already be done with the safe entry API (although it's missing for sets)?

let mut set: HashMap<String, ()>;
set.entry(String::from("abc")).insert_entry(()).get()
cyqsimon commented 1 year ago

for the types with mutable members (so not sets) imho the returned reference should be a mutable reference, the caller can easily convert to a shared reference if they desire but not the other way around.

I certainly see the "why not?" appeal here, and TBH I did consider this while writing the post. Ultimately I think it's a tradeoff between consistency and functionality. It might be surprising for some users to find that:

Ultimately I decided to go for the consistency and just return an immutable reference regardless. We can always add *_get_mut variants in the future if necessary, at least that's my thinking.

But yeah admittedly I am pretty torn on what to do here. I would like to know the various opinions of the broader community.

cyqsimon commented 1 year ago

What is here that cannot already be done with the safe entry API (although it's missing for sets)?

I'm entirely in agreement here that in the specific case of *Map types (and in the future hopefully, *Set types as well; in fact if I'm not mistaken, it has already landed in hashbrown, but hasn't propagated to the standard library yet), these methods will offer little except for ergonomic improvements over the more general entry API. However for other collection types that don't have an entry API (or even the concept of it), they will be somewhat more tangibly useful IMO.

So again it comes down to a tradeoff with consistency I suppose. It would be nice if this family of methods is universal across all collections; but I also fully admit that this will come with the downside of duplication of functionality. Honestly, I don't know which path forward is better either. After all, that's why this is a pre-RFC, sorry ¯\_(ツ)_/¯

tbu- commented 1 year ago

The proposed functions are unfortunately not all that practical because they cause the underlying collection to be borrowed mutably, meaning that one cannot call any function or get any other reference out of them.

let new = vec.push_get(123);
// this errors:
println!("{}", vec[1], new);
cyqsimon commented 1 year ago

The proposed functions are unfortunately not all that practical because they cause the underlying collection to be borrowed mutably, meaning that one cannot call any function or get any other reference out of them.

Yikes. That's a very good point (unfortunately) that I overlooked. I just assumed that since we're only retaining an immutable reference, the borrow checker would see that the collection is no longer being borrowed mutably.

Speaking of which, I'm very curious whether there is actually a safety reason behind this behaviour, or is it just a technical limitation: why doesn't the borrow checker see that the retained reference has been "downgraded"? It seems to me like all the information the borrow checker would need to deduce this is available in the function signature.

SOF3 commented 1 year ago

Speaking of which, I'm very curious whether there is actually a safety reason behind this behaviour, or is it just a technical limitation: why doesn't the borrow checker see that the retained reference has been "downgraded"? It seems to me like all the information the borrow checker would need to deduce this is available in the function signature.

A simple counterexample:

fn demut(x: &mut AtomicU32) -> &u32 {
    &*x.get_mut()
}

let mut x = AtomicU32::new(1);
let y = demut(&mut x);
// y is now an immutable &u32 that equals &1
x.store(2, SeqCst);
// is *y now 1 or 2?
golddranks commented 1 year ago

@cyqsimon As @SOF3 's example shows, the borrow checker can't see that because that would be essentially an API guarantee, not an implementation detail. Besides, there's the "design rule" that the borrow checker can't depend on information from inside of function bodies, as those are considered implementation details too. There has been some ideas of having a "downgrading annotation" (e.g. https://internals.rust-lang.org/t/relaxing-the-borrow-checker-for-fn-mut-self-t/3256/21), but no concrete proposals, and I doubt the time is ripe for such a proposal at the moment.