indexmap-rs / indexmap

A hash table with consistent order and fast iteration; access items by key or sequence index
https://docs.rs/indexmap/
Other
1.77k stars 154 forks source link

Feature request: `IndexMap::replace_key` #362

Open czy-29 opened 1 week ago

czy-29 commented 1 week ago

The idea is simple: to replace an existing key with a new one and preserve its index position. If I want to implement this feature externally, there seems to be no O(1) solution. It seems that only internal implementation can achieve this level of time complexity.

cuviper commented 1 week ago

You can do it with MutableKeys:

use indexmap::map::MutableKeys;
map.get_full_mut2(&k).map(|(_i, key, _value)| *key = k);

Or with the raw entry extension: https://docs.rs/indexmap/latest/indexmap/map/raw_entry_v1/trait.RawEntryApiV1.html

use indexmap::map::RawEntryApiV1;
map.raw_entry_mut_v1().from_key(&k).and_modify(|key, _value| *key = k);

For a more direct API, I would prefer to see this proposed and ultimately stabilized on the standard HashMap first, probably citing HashSet::replace as the basis.

czy-29 commented 1 week ago

You can do it with MutableKeys:

use indexmap::map::MutableKeys;
map.get_full_mut2(&k).map(|(_i, key, _value)| *key = k);

Or with the raw entry extension: https://docs.rs/indexmap/latest/indexmap/map/raw_entry_v1/trait.RawEntryApiV1.html

use indexmap::map::RawEntryApiV1;
map.raw_entry_mut_v1().from_key(&k).and_modify(|key, _value| *key = k);

For a more direct API, I would prefer to see this proposed and ultimately stabilized on the standard HashMap first, probably citing HashSet::replace as the basis.

If the values of the new key and the old key are different (which also means that different hashes will be generated), can both approaches still be used normally?

cuviper commented 1 week ago

It would be an error (but still memory-safe) to overwrite a key with a different Hash/Eq. That's the main reason why these extra methods are opt-in trait imports, and both approaches are equivalent in this regard.

But as long as you encapsulate your own replace_key implementation to always use the same lookup key for replacement, like I showed with k above, then there should be no problem.

czy-29 commented 1 week ago

It would be an error (but still memory-safe) to overwrite a key with a different Hash/Eq. That's the main reason why these extra methods are opt-in trait imports, and both approaches are equivalent in this regard.

But as long as you encapsulate your own replace_key implementation to always use the same lookup key for replacement, like I showed with k above, then there should be no problem.

I still don't quite understand. During this process, I saw that generics K and Q can be of different types. Are you saying that only when K and Q are of different types, and their logic for generating Hash/Eq is different, and the user uses other methods for indexing besides the method I encapsulated, can errors occur only when all these conditions are met, and there are no problems in other situations? Including when K and Q are different values of the same type, and the user has also used other methods for indexing outside, will the result still be correct in this case?

cuviper commented 1 week ago

Those opt-in mutability traits can change keys improperly, even aside from K/Q relationships. e.g. you could get_full_mut2 with one key and write back an unrelated key. This is also possible to break if the key type has its own interior mutability, like fields with Cell or Atomic types. You could hide those traits from your encapsulation, but I don't know what you could do about interior mutability, unless you only work with known key types that don't have that.

As for those generics, in the standard library, K and Q have a Borrow relationship, which documents:

Further, when providing implementations for additional traits, it needs to be considered whether they should behave identically to those of the underlying type as a consequence of acting as a representation of that underlying type. Generic code typically uses Borrow<T> when it relies on the identical behavior of these additional trait implementations. These traits will likely appear as additional trait bounds.

In particular Eq, Ord and Hash must be equivalent for borrowed and owned values: x.borrow() == y.borrow() should give the same result as x == y.

In indexmap we've generalized that a little more with the Equivalent trait, but the principle is the same.

czy-29 commented 1 week ago

Those opt-in mutability traits can change keys improperly, even aside from K/Q relationships. e.g. you could get_full_mut2 with one key and write back an unrelated key. This is also possible to break if the key type has its own interior mutability, like fields with Cell or Atomic types. You could hide those traits from your encapsulation, but I don't know what you could do about interior mutability, unless you only work with known key types that don't have that.

As for those generics, in the standard library, K and Q have a Borrow relationship, which documents:

Further, when providing implementations for additional traits, it needs to be considered whether they should behave identically to those of the underlying type as a consequence of acting as a representation of that underlying type. Generic code typically uses Borrow<T> when it relies on the identical behavior of these additional trait implementations. These traits will likely appear as additional trait bounds. In particular Eq, Ord and Hash must be equivalent for borrowed and owned values: x.borrow() == y.borrow() should give the same result as x == y.

In indexmap we've generalized that a little more with the Equivalent trait, but the principle is the same.

Ah, I understand now! Indeed, the correctness of keys in memory cannot be fully guaranteed due to the presence of interior mutability. I just need to ensure the correctness of the behavior of the method I encapsulate myself. Thank you for your answer!

czy-29 commented 1 week ago

But I think we can keep this issue open. When the HashMap in the standard library finally has a method similar to replace_key, we can go back to this issue and see if we can add this method to our IndexMap at that time.

czy-29 commented 1 week ago

You can do it with MutableKeys:

use indexmap::map::MutableKeys;
map.get_full_mut2(&k).map(|(_i, key, _value)| *key = k);

Or with the raw entry extension: https://docs.rs/indexmap/latest/indexmap/map/raw_entry_v1/trait.RawEntryApiV1.html

use indexmap::map::RawEntryApiV1;
map.raw_entry_mut_v1().from_key(&k).and_modify(|key, _value| *key = k);

For a more direct API, I would prefer to see this proposed and ultimately stabilized on the standard HashMap first, probably citing HashSet::replace as the basis.

Um... I feel that there are still significant issues with this approach... Consider the following code:

use indexmap::{map::MutableKeys, Equivalent, IndexMap};
use std::{
    borrow::Borrow,
    hash::{BuildHasher, Hash},
};
use thiserror::Error;

/// Error returned by `MapExt::replace_key`.
#[derive(Error, Debug, Copy, Clone, Eq, PartialEq, Hash)]
pub enum ReplaceKeyErr {
    #[error("replace_key: the old key does not exist")]
    /// The old key does not exist.
    OldKeyNotExist,
    #[error("replace_key: the new key is already occupied")]
    /// The new key is already occupied.
    NewKeyOccupied,
}

/// Some general extensions to `Maps` (such as
/// [`HashMap`](https://doc.rust-lang.org/stable/std/collections/struct.HashMap.html),
/// [`BTreeMap`](https://doc.rust-lang.org/stable/std/collections/struct.BTreeMap.html),
/// [`IndexMap`](https://docs.rs/indexmap/latest/indexmap/map/struct.IndexMap.html)).
pub trait MapExt<K, Q: ?Sized = K> {
    /// Replace an existing key with a new (non-existing) one.
    ///
    /// If k1 does not exist, return `Err(ReplaceKeyErr::OldKeyNotExist)`.
    ///
    /// Otherwise, if k2 and k1 are equal, do nothing and return `Ok(())`.
    ///
    /// Otherwise, if k2 also exists, return `Err(ReplaceKeyErr::NewKeyOccupied)`.
    ///
    /// Otherwise, return `Ok(())` after the replacement is completed.
    fn replace_key(&mut self, k1: &Q, k2: K) -> Result<(), ReplaceKeyErr>
    where
        K: Borrow<Q>;
}

impl<K, Q, V, S> MapExt<K, Q> for IndexMap<K, V, S>
where
    K: Borrow<Q>,
    Q: ?Sized + Hash + Equivalent<K>,
    S: BuildHasher,
{
    fn replace_key(&mut self, k1: &Q, k2: K) -> Result<(), ReplaceKeyErr> {
        if !self.contains_key(k1) {
            return Err(ReplaceKeyErr::OldKeyNotExist);
        }

        if k1.equivalent(&k2) {
            return Ok(());
        }

        if self.contains_key(k2.borrow()) {
            return Err(ReplaceKeyErr::NewKeyOccupied);
        }

        // See: https://github.com/indexmap-rs/indexmap/issues/362
        *self
            .get_full_mut2(k1)
            .expect("this should be unreachable")
            .1 = k2;
        Ok(())
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn replace_key_indexmap() {
        let mut map = indexmap::indexmap! {
            "k1".to_string() => 123,
            "k2".to_string() => 456
        };

        assert_eq!(map.replace_key("k1", "k3".to_string()), Ok(()));
        assert_eq!(map["k3"], 123);
    }
}

The test failed with the following output:

failures:

---- collections::tests::replace_key_indexmap stdout ----
thread 'collections::tests::replace_key_indexmap' panicked at src\collections.rs:242:23:
IndexMap: key not found
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

failures:
    collections::tests::replace_key_indexmap

The code on line 242 is:

assert_eq!(map["k3"], 123);

Is this approach unable to achieve the purpose of this test? Is there a better way to achieve it?

cuviper commented 1 week ago

I misunderstood that you do want to change the actual Hash+Eq identity of the key. That's not what HashSet::replace or IndexSet::replace do, nor what I would personally expect from a corresponding Map::replace_key. Of course, I was the one who tried to make that connection, not you.

In that case, you should not use the mutability traits, but an actual new insertion. You can do that and get it into the right place like this:

    fn replace_key(&mut self, k1: &Q, k2: K) -> Result<(), ReplaceKeyErr> {
        let Some(i) = self.get_index_of(k1) else {
            return Err(ReplaceKeyErr::OldKeyNotExist);
        };

        if k1.equivalent(&k2) {
            return Ok(());
        }

        if self.contains_key(&k2) {
            return Err(ReplaceKeyErr::NewKeyOccupied);
        }

        // Note, this temporarily displaces the last entry into `i`,
        // but we'll swap it back after we insert the new key.
        let Some((_, v)) = self.swap_remove_index(i) else {
            return Err(ReplaceKeyErr::OldKeyNotExist);
        };

        let (j, _) = self.insert_full(k2, v);
        self.swap_indices(i, j);
        Ok(())
    }

I don't think we expose anything that would let you do this without the swaps, but at least it's still O(1).

czy-29 commented 1 week ago

I misunderstood that you do want to change the actual Hash+Eq identity of the key. That's not what HashSet::replace or IndexSet::replace do, nor what I would personally expect from a corresponding Map::replace_key. Of course, I was the one who tried to make that connection, not you.

In that case, you should not use the mutability traits, but an actual new insertion. You can do that and get it into the right place like this:

    fn replace_key(&mut self, k1: &Q, k2: K) -> Result<(), ReplaceKeyErr> {
        let Some(i) = self.get_index_of(k1) else {
            return Err(ReplaceKeyErr::OldKeyNotExist);
        };

        if k1.equivalent(&k2) {
            return Ok(());
        }

        if self.contains_key(&k2) {
            return Err(ReplaceKeyErr::NewKeyOccupied);
        }

        // Note, this temporarily displaces the last entry into `i`,
        // but we'll swap it back after we insert the new key.
        let Some((_, v)) = self.swap_remove_index(i) else {
            return Err(ReplaceKeyErr::OldKeyNotExist);
        };

        let (j, _) = self.insert_full(k2, v);
        self.swap_indices(i, j);
        Ok(())
    }

I don't think we expose anything that would let you do this without the swaps, but at least it's still O(1).

Thanks! All tests have passed! 😊