rust-lang / hashbrown

Rust port of Google's SwissTable hash map
https://rust-lang.github.io/hashbrown
Apache License 2.0
2.38k stars 275 forks source link

Add an assertion to `HashSet::get_or_insert_with` #518

Open cuviper opened 5 months ago

cuviper commented 5 months ago

Since the Rust libs-api team considers it problematic for HashSet<T> to be unreliable with well-behaved T (not user-controlled), we should not allow mismatches to be inserted through get_or_insert_with.

cuviper commented 5 months ago

I didn't notice #400 before, but I think that's doing a lot more than is really needed for the immediate problem. For instance, I don't think it's important to compare the hashes, because that calculation is not under user control here. And value equality already implies hash equality, especially for known-good non-user types.

JustForFun88 commented 5 months ago

I didn't notice #400 before, but I think that's doing a lot more than is really needed for the immediate problem. For instance, I don't think it's important to compare the hashes, because that calculation is not under user control here. And value equality already implies hash equality, especially for known-good non-user types.

I can correct the code if necessary.

JustForFun88 commented 5 months ago

I didn't notice #400 before, but I think that's doing a lot more than is really needed for the immediate problem. For instance, I don't think it's important to compare the hashes, because that calculation is not under user control here. And value equality already implies hash equality, especially for known-good non-user types.

I removed unnecessary code. Now the comparison is carried out by values. I don’t think this will affect performance. I use the from_key_hashed_nocheck method, so the hash is calculated only once, where the old version of the function calculated the hash twice (separately inside from_key and inside or_insert_with methods)

Amanieu commented 5 months ago

For some background, this came up while I was trying to remove RawEntry from hashbrown (by moving it under an off-by-default cargo feature). The only internal uses of the raw entry API were from the HashSet::get_or_insert* methods. I can rewrite get_or_insert and get_or_insert_owned to use Entry and EntryRef (with some minor changes). But for get_or_insert_with I would have to fall back to the RawTable API, which I would rather avoid: HashSet should only expose operations that can also be performed on a normal HashMap.

JustForFun88 commented 5 months ago

I can rewrite get_or_insert and get_or_insert_owned to use Entry and EntryRef (with some minor changes). But for get_or_insert_with I would have to fall back to the RawTable API, which I would rather avoid: HashSet should only expose operations that can also be performed on a normal HashMap.

Well, get_or_insert can be rewritten to Entry by slightly changing OccupiedEntry::key, namely by adding the lifetime 'a to the key K (pub fn key(&self) -> &'a K).

But get_or_insert_owned is not so easy to solve. First, you need to use EntryRef, not Entry. Secondly, all existing VacantEntryRef methods work with K: From<&Q>, and not with Q: ToOwned<Owned = K>.

You need to either rewrite get_or_insert_owned, or change the signatures of VacantEntryRef to ToOwned, or add a new method for VacantEntryRef (ala from_owned_with).

cuviper commented 5 months ago

Well, get_or_insert can be rewritten to Entry by slightly changing OccupiedEntry::key, namely by adding the lifetime 'a to the key K (pub fn key(&self) -> &'a K).

That would be unsound, because a user could follow that with a remove call, making that reference dangling. That's why the raw entry has fn into_key(self) -> &'a mut K, consuming the entry so it can no longer be used.

We could add a similar OccupiedEntry::into_key(self) -> &'a K though.

JustForFun88 commented 5 months ago

That would be unsound, because a user could follow that with a remove call, making that reference dangling. That's why the raw entry has fn into_key(self) -> &'a mut K, consuming the entry so it can no longer be used.

That's right, I forgot about the remove. Well, in any case, we will have to expand the existing HashMap API, if we want remove RawEntry