rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.26k stars 12.71k forks source link

add find_or_insert method to the Map trait #5568

Closed thestinger closed 10 years ago

graydon commented 11 years ago

Agreed. Also several of the other ones that appear on hashmap: find_or_insert_with, insert_or_update_with, reserve_at_least, consume.

thestinger commented 11 years ago

@graydon: I think consume will get replaced with an external iterator, moving the container into the iterator on creation

davidhalperin commented 11 years ago

I'm working on this. I'm trying to add default methods where possible, swap can be implemented in terms of insert_or_update_with for instance. Unfortunately, find_or_insert_with and insert_or_update_with can't quite be implemented in terms of each other. The options for this as far as I can tell are:

Any preferences between those or other suggestions?

thestinger commented 11 years ago

I don't think there should be a default method implementation if it's going to do two searches but I'm fine with both of the other options.

nikomatsakis commented 11 years ago

Perhaps we can just add "mangle" to the map trait and impl the others as default methods based on that?

davidhalperin commented 11 years ago

First, sorry for saying I was going to do this and sitting on it for so long. I keep meaning to make time to try to help out with rust but getting busy with other things. I still plan to try to get this done.

How about changing find_or_insert_with to return (&'a K, &'a mut V) instead of just &'a mut V? It seems like there should be some sort of insert operation where you remember what the key was afterwards, anyway. If we did that, it would mean you could implement insert_or_update_with like so:

fn insert_or_update_with<'a>(&'a mut self, k: K, v: V,
                             f: &fn(&K, &mut V)) -> &'a mut V {
    let cell = Cell::new(v);
    let (k, v) = self.find_or_insert_with(k, |_| cell.take());
    if !cell.is_empty() {
        f(k, v);
    }
    v
}

That seems nicer than mangle to me. I think mangle is a little clunky and poorly named, FWIW. Up to you guys though.

nikomatsakis commented 11 years ago

Interesting thought! I agree that mangle is clunky and awkwardly named (despite being the one who suggested it).

I would also consider just modifying the callback for insert_or_update_with not to take the key. I personally have never once needed it, and I see no code in our repository that does either (though I guess that, in the general case of affine keys, it can be useful).

TeXitoi commented 11 years ago

@davidhalperin : are you still working on that? I am interested in working on this issue. Maybe we can work together?

davidhalperin commented 11 years ago

@TeXitoi sorry for not getting back to you. I have Monday off and I think I should have a good chunk of time to work on this this weekend. By the end of the day on Monday I'll either finish this and put together a pull request or pass what I've done so far on to you to finish up. Sound good?

TeXitoi commented 11 years ago

@davidascher Perfect :)

davidhalperin commented 10 years ago

I got very close to finishing it this weekend. The one thing I couldn't figure out is how to write insert_or_update_with for TreeMap. You can't just make a recursive call and then call skew and split like the current insert does because this insert returns an internal pointer, so the current node stays mutably borrowed and you can't mutably borrow again it to balance the tree. If anyone has any ideas about how to get around that, I should be able to finish this up shortly.

TeXitoi commented 10 years ago

IMHO, it's not possible to do that with safe code without doing an insert and then a find (i.e. with 2 searches). Then, skew and split move the children, so the reference should be valid. The only way I can see to do that with a search is with unsafe code:

fn insert_or_update_with<'a, K: TotalOrd, V>(
    node: &'a mut Option<~TreeNode<K, V>>,
    key: K, value: V,
    f: &fn(&K, &mut V)) -> &'a mut V {
    match *node {
      Some(ref mut save) => {
        match key.cmp(&save.key) {
          Less => {
            let res: *mut V = unsafe {
              transmute(insert_or_update_with(&mut save.left, key, value, f))
            };
            skew(save);
            split(save);
            unsafe{transmute(res)}
          }
          Greater => {
            let res: *mut V = unsafe {
              transmute(insert_or_update_with(&mut save.right, key, value, f))
            };
            skew(save);
            split(save);
            unsafe{transmute(res)}
          }
          Equal => {
            save.key = key;
            f(&save.key, &mut save.value);
            &mut save.value
          }
        }
      }
      None => {
       *node = Some(~TreeNode::new(key, value));
        &mut node.get_mut_ref().value
      }
    }    
}

It compiles, but I didn't try running it and that's the first time I write unsafe code. And that's quite sad to have TreeMap using unsafe code.

I hope I'm wrong and there is another solution.

davidhalperin commented 10 years ago

@TeXitoi I reached the same conclusion and was going to ask for a sanity check. I don't think doing 2 searches would help even if we wanted to sacrifice performance to write safe code. The key gets moved into the map and if you tried to keep a reference while you rebalanced the key, you'd run into exactly the same issue. My implementation uses transmute_mut_region instead of transmute to be a little more specific about how it's being unsafe but is basically the same and it seems to work. Thanks for the help! I will finish this up today.

nikomatsakis commented 10 years ago

That does indeed look hard to type check. It wasn't obvious to me from glancing at the code that it was safe amidst all the unique pointer juggling and so on. @davidhalperin I think doing an insert and then a search would work, if you wanted to keep the code safe? not sure why it matters that key/value are moved into the map.

davidhalperin commented 10 years ago

@nikomatsakis Let's say the helper find_or_insert_with returned &'a K instead of &'a mut V. The actual method in the impl would call the helper then call find_mut. That's how we'd do the insert then search, or am I missing something? We can't safely implement a helper find_or_insert_with that returns &'a K for exactly the same reason we can't write one that returns &'a mut V. The left branch would look like:

Left => {
    let k = find_or_insert_with(&mut save.left, key, f);
    skew(save); // We can't borrow save again here because we're holding onto k
    split(save);
    k
}

Is there a different way we could avoid do the insert then search strategy that would avoid that problem?

nikomatsakis commented 10 years ago

@davidhalperin Ah, NOW I think I see your point. Yes, without the ability to clone the key, it does leave us in a bit of a pickle, doesn't it? You can't do the find after the insert because the key is moved; you can't return a pointer to the key because it would prolong the loan on the map (and rightly so).

If were insistent on avoiding unsafe code, I imagine we could have some insert-like helper that takes an optional vector in which the path to the node can be encoded:

enum Path { Left, Right };

fn find_or_insert_with(key: K, value: V, ...) -> &'mut V {
    let mut path = ~[];
    self.insert_helper(key, value, &mut path);
    self.find_from_path(path)
}

fn insert_helper(key: K, value: V, ..., path: &mut ~[Path]) {
    ...
    if should insert on left left {
        path.push(Left);
        self.left.insert_helper(key, value, ...);
    } else { ... }
}

fn find_from_path(&mut self, path: &[Path]) -> &'a mut V {
    if path.is_empty() { return &mut self.value; }
    match path[0] {
        Left => self.left.find_from_path(path.slice_from(1)),
        Right => ...    
    }
}

Obviously this is a bit over the top, though out of idle curiosity I do wonder if you could encode the path in a u64 and a uint -- probably not, depends I guess on the balancing guarantees. :)

pnkfelix commented 10 years ago

cc me

aochagavia commented 10 years ago

Is this solved? If not, I would like to work on it.

davidhalperin commented 10 years ago

@aochagavia My pull request got closed with the comment that the map trait should be thought out more, so I don't think this is going to be done as described in this issue (if it is, I should be able to dust of my changes pretty easily). If you want to work on something around this you should probably talk to the rust devs and see what they actually want.

brson commented 10 years ago

Nominating for removal from milestone. The world won't end if this method isn't defined in 1.0.

pnkfelix commented 10 years ago

Removing from milestone.

(The process of reviewing the std lib and assigning stability attributes shouid eventually address this problem, perhaps by marking the existing Map as unstable.)

gereeter commented 10 years ago

Should this be closed now that rust-lang/rfcs#216 has been accepted?

alexcrichton commented 10 years ago

This isn't quite a dupe of that RFC because it doesn't mention the Map trait specifically (which this issue is referring to), but I think this is best covered by https://github.com/rust-lang/rfcs/pull/235 now, so I'm going to close this in favor of that. Note that https://github.com/rust-lang/rfcs/pull/235 recommends removing all collections traits, which would make this issue moot.

Gankra commented 10 years ago

At very least, find_or_insert is deprecated, now. So adding it to Map is uh... dubious?