Closed josephrocca closed 5 years ago
It seems that FxHash
is generating particularly poor hashes. If you make hashbrown use the libstd default hasher (SipHash) then it generates comparable results:
let mut m = HashMap::with_hasher(std::collections::hash_map::RandomState::new());
Ah okay, I just wasn't sure whether there was something strange going on, or whether this is expected/not-surprising. I figured that since SiphHash is DOS-resistant, and since I don't need that security for my use case, then I could go with hashbrown and get a bit of a speedup. I'll just stick with the built-in HashMap unless I'm dealing with numbers. Thanks for your help!
I'd like to keep this open, since we really need to fix the hasher.
Ah, great! I'm too new to to rust (and hashing algorithms) to help out myself at this point, but I'm looking forward to testing out the improvements! :+1:
Fxhash is know to generate poor hashes often. I'd consider switching to Seahash. It's what I did I'm my crate ccl.
Fixed by #97
For anyone who's arrived here from google searches trying to find a fast deterministic hash function for hashmaps, I've learned a few things:
Hasher
is deterministic. For example, HashMap::<_, _, BuildHasherDefault<DefaultHasher>>::default()
is deterministic.seahash
is actually not faster than the stdlib.~/Desktop> rust-script -w hyperfine foo.rs # stdlib
Benchmark 1: /home/asaveau/.cache/rust-script/binaries/release/foo_d757ec98400a8a2dbe591e4e
Time (mean ± σ): 224.0 ms ± 7.7 ms [User: 224.2 ms, System: 0.3 ms]
Range (min … max): 216.4 ms … 244.8 ms 12 runs
~/Desktop> nano foo.rs
12sec ~/Desktop> rust-script -w hyperfine foo.rs # seahash
Benchmark 1: /home/asaveau/.cache/rust-script/binaries/release/foo_d757ec98400a8a2dbe591e4e
Time (mean ± σ): 276.5 ms ± 13.9 ms [User: 276.1 ms, System: 0.0 ms]
Range (min … max): 265.9 ms … 311.3 ms 10 runs
With the following benchmarking code (fixed from the original issue to not use memory allocation or depend on the system in any way, it's purely compute bound):
#!/usr/bin/env rust-script
//! Dependencies can be specified in the script file itself as follows:
//!
//! ```cargo
//! [dependencies]
//! itoa = "1"
//! rand = "0.8"
//! rand_xoshiro = "0.6"
//! seahash = "4"
//! ```
use rand::Rng;
use rand::SeedableRng;
use rand_xoshiro::Xoshiro256PlusPlus;
use std::time::Instant;
use std::hash::*;
fn main() {
let iters = 10000000;
// let mut hasher = seahash::SeaHasher::new();
let mut hasher = DefaultHasher::new();
let mut rng = Xoshiro256PlusPlus::seed_from_u64(42);
let mut buf = itoa::Buffer::new();
let now = Instant::now();
for _ in 0..iters {
let s = buf.format(rng.gen_range(0..2u32.pow(31)));
hasher.write(s.as_bytes());
}
println!("{:?} {}", now.elapsed(), hasher.finish());
}
Hi, I'm very new to rust and just playing around with hash maps and I found that using the code below, the
std::collections::HashMap
is significantly faster than hashbrown. I'm wondering if someone could explain this? Is hashbrown only meant to be faster for certain types of data? Or am I doing something wrong?Here's my laptop's output for
hashbrown::HashMap
:And here's my laptop's output for
std::collections::HashMap
: