rust-lang / hashbrown

Rust port of Google's SwissTable hash map
https://rust-lang.github.io/hashbrown
Apache License 2.0
2.46k stars 288 forks source link

segfault at HashMap.get #552

Open houstar opened 2 months ago

houstar commented 2 months ago

rust toolchain 1.60.0 glibc 2.32-1.4 with gnu build hashbrown 0.12.0

(gdb) bt
#0  0x00007fd34a0415f2 in ?? ()
#1  0x00005608e605eb09 in core::intrinsics::copy_nonoverlapping (src=0x7fd54409a4df <error: Cannot access memory at address 0x7fd54409a4df>, dst=0x7fd349d341e0 <__GI__IO_vfscanf+4336> "\245\236$\022\242\"\027>\001\000", count=16)
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/intrinsics.rs:2104
#2  0x00005608e5dcb229 in core::core_arch::x86::sse2::_mm_loadu_si128 (mem_addr=0x7fd54409a4df) at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/../../stdarch/crates/core_arch/src/x86/sse2.rs:1196
#3  0x00005608e5e0545a in hashbrown::raw::sse2::Group::load (ptr=0x7fd54409a4df <error: Cannot access memory at address 0x7fd54409a4df>) at /cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.0/src/raw/sse2.rs:50
#4  0x00005608e5dbf191 in hashbrown::raw::RawTableInner<A>::find_inner (self=0x7fd3440b07c8, hash=6691458850544120079, eq=...) at /cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.0/src/raw/mod.rs:1180
#5  0x00005608e5dbda14 in hashbrown::raw::RawTable<T,A>::find (self=0x7fd3440b07c8, hash=6691458850544120079, eq=...) at /cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.0/src/raw/mod.rs:822
#6  0x00005608e5dbdfee in hashbrown::raw::RawTable<T,A>::get (self=0x7fd3440b07c8, hash=6691458850544120079, eq=...) at /cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.0/src/raw/mod.rs:837
#7  0x00005608e5dd1d5e in hashbrown::map::HashMap<K,V,S,A>::get_inner (self=0x7fd3440b07b8, k="blkio") at /cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.0/src/map.rs:1103
#8  0x00005608e5dd1b09 in hashbrown::map::HashMap<K,V,S,A>::get (self=0x7fd3440b07b8, k="blkio") at /cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.0/src/map.rs:1055
#9  0x00005608e5df9059 in std::collections::hash::map::HashMap<K,V,S>::get (self=0x7fd3440b07b8, k="blkio") at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/collections/hash/map.rs:829
```[tasklist]
### Tasks
### Tasks
cuviper commented 2 months ago

Is it reproducible with the current Rust 1.81.0 toolchain? That will also be using the latest hashbrown v0.14.5.

houstar commented 2 months ago

OK, let me try it.

houstar commented 2 months ago

NOT reproducible on Rust 1.81.0~ Do we have any fixes on this? Why?

cuviper commented 2 months ago

I don't know what in particular may have fixed it. If you share your code and reproduction steps, I may be able to investigate, but generally speaking only the current version will receive any support.

You could also try cargo-bisect-rustc to look for the exact toolchain change where the problem went away.

houstar commented 2 months ago

Is the reference of the value related?

I use if hashmap.contains_key(k) before hashmap.get to avoid the segfault issue,

When I glanced at the contains_key source code, it looks same as hashmap.get but haven't the reference of the value.


  1163     pub fn contains_key<Q: ?Sized>(&self, k: &Q) → bool
  1164     where
  1165         K: Borrow<Q>,
  1166         Q: Hash + Eq,
  1167     {
  1168         self.get_inner(k).is_some()
  1169     }

the hashmap get which  inside rust 1.81.0 also haven't the reference of the value

```rust
  1308     pub fn get<Q: ?Sized>(&self, k: &Q) → Option<&V>
  1309     where
  1310         Q: Hash + Equivalent<K>,
  1311     {
  1312         // Avoid `Option::map` because it bloats LLVM IR.
  1313         match self.get_inner(k) {
  1314             Some((_, v)) ⇒ Some(v),     <<<------
  1315             None ⇒ None,
  1316         }
  1317     }

the hashap get which inside  rust 1.60.0
```rust
  1049     pub fn get<Q: ?Sized>(&self, k: &Q) → Option<&V>
  1050     where
  1051         K: Borrow<Q>,
  1052         Q: Hash + Eq,
  1053     {
  1054         // Avoid `Option::map` because it bloats LLVM IR.
  1055         match self.get_inner(k) {
  1056             Some(&(_, ref v)) ⇒ Some(v),    <<<----
  1057             None ⇒ None,
  1058         }
  1059     }
cuviper commented 2 months ago

I use if hashmap.contains_key(k) before hashmap.get to avoid the segfault issue,

That sounds like it may be a miscompilation by rustc/LLVM. Is there a reason you can't use the newer version?

Is the reference of the value related?

The references should be identical. Since get_inner returns Option<&(K, V)>, the older Some(&(_, ref v)) pattern was destructuring the outer reference and binding a new &V using the ref keyword. The newer Some((_, v)) pattern accomplishes the same thing implicitly via match ergonomics, where the unmatched & leads to a default ref binding mode within.

Here's a demo that the MIR is the same for either style of pattern: https://rust.godbolt.org/z/hMzjnccja

houstar commented 2 months ago

That sounds like it may be a miscompilation by rustc/LLVM. Is there a reason you can't use the newer version?

We're on going to upgrade to 1.81.0. In order to push 1.81.0 upgrading, we should find the rootcause , testing, and production ready.

houstar commented 2 months ago

I'm following your guide to build rust compiler to bisect the rustc which custom build HashMap and failed to build.

https://rustc-dev-guide.rust-lang.org/building/how-to-build-and-run.html

cuviper commented 2 months ago

cargo-bisect-rustc should be able to use nightly builds to get you most of the way. If it is a codegen problem, I suspect you'll find a big change like "upgrade LLVM to a new version" that fixes it, but it could be something more targeted.