rust-lang / hashbrown

Rust port of Google's SwissTable hash map
https://rust-lang.github.io/hashbrown
Apache License 2.0
2.43k stars 285 forks source link

Slower than `std::collections::HashMap` when keys and values are strings? #88

Closed josephrocca closed 5 years ago

josephrocca commented 5 years ago

Hi, I'm very new to rust and just playing around with hash maps and I found that using the code below, the std::collections::HashMap is significantly faster than hashbrown. I'm wondering if someone could explain this? Is hashbrown only meant to be faster for certain types of data? Or am I doing something wrong?

use hashbrown::HashMap;
//use std::collections::HashMap;
use rand::Rng;
use std::time::Instant;
use numtoa::NumToA;

fn main() {
    let mut m = HashMap::new();
    let mut rng = rand::thread_rng();
    let now = Instant::now();
    let mut i = 0;
    let mut last_time = now.elapsed().as_millis();
    while i < 10000000 {
      let mut key_buf = [0u8; 10];
      let mut value_buf = [0u8; 10];
      let key = rng.gen_range(0, 2u32.pow(31)).numtoa_str(16, &mut key_buf);
      let value = rng.gen_range(0, 2u32.pow(31)).numtoa_str(16, &mut value_buf);
      m.insert(key.to_owned(), value.to_owned());
      i += 1;
      if i%1000000 == 0 {
        let now_time = now.elapsed().as_millis();
        println!("{} {}", i, now_time-last_time);
        last_time = now_time;
      }
    }

}

Here's my laptop's output for hashbrown::HashMap:

1000000 943
2000000 1749
3000000 1388
4000000 2739
5000000 1433
6000000 1811
7000000 2548
8000000 3885
9000000 1603
10000000 1852

And here's my laptop's output for std::collections::HashMap:

1000000 709
2000000 738
3000000 513
4000000 1062
5000000 527
6000000 573
7000000 677
8000000 1479
9000000 550
10000000 569
Amanieu commented 5 years ago

It seems that FxHash is generating particularly poor hashes. If you make hashbrown use the libstd default hasher (SipHash) then it generates comparable results:

let mut m = HashMap::with_hasher(std::collections::hash_map::RandomState::new());
josephrocca commented 5 years ago

Ah okay, I just wasn't sure whether there was something strange going on, or whether this is expected/not-surprising. I figured that since SiphHash is DOS-resistant, and since I don't need that security for my use case, then I could go with hashbrown and get a bit of a speedup. I'll just stick with the built-in HashMap unless I'm dealing with numbers. Thanks for your help!

Amanieu commented 5 years ago

I'd like to keep this open, since we really need to fix the hasher.

josephrocca commented 5 years ago

Ah, great! I'm too new to to rust (and hashing algorithms) to help out myself at this point, but I'm looking forward to testing out the improvements! :+1:

xacrimon commented 5 years ago

Fxhash is know to generate poor hashes often. I'd consider switching to Seahash. It's what I did I'm my crate ccl.

Amanieu commented 5 years ago

Fixed by #97

SUPERCILEX commented 11 months ago

For anyone who's arrived here from google searches trying to find a fast deterministic hash function for hashmaps, I've learned a few things:

~/Desktop> rust-script -w hyperfine foo.rs     # stdlib
Benchmark 1: /home/asaveau/.cache/rust-script/binaries/release/foo_d757ec98400a8a2dbe591e4e
  Time (mean ± σ):     224.0 ms ±   7.7 ms    [User: 224.2 ms, System: 0.3 ms]
  Range (min … max):   216.4 ms … 244.8 ms    12 runs

~/Desktop> nano foo.rs
12sec ~/Desktop> rust-script -w hyperfine foo.rs     # seahash
Benchmark 1: /home/asaveau/.cache/rust-script/binaries/release/foo_d757ec98400a8a2dbe591e4e
  Time (mean ± σ):     276.5 ms ±  13.9 ms    [User: 276.1 ms, System: 0.0 ms]
  Range (min … max):   265.9 ms … 311.3 ms    10 runs

With the following benchmarking code (fixed from the original issue to not use memory allocation or depend on the system in any way, it's purely compute bound):

#!/usr/bin/env rust-script
//! Dependencies can be specified in the script file itself as follows:
//!
//! ```cargo
//! [dependencies]
//! itoa = "1"
//! rand = "0.8"
//! rand_xoshiro = "0.6"
//! seahash = "4"
//! ```

use rand::Rng;
use rand::SeedableRng;
use rand_xoshiro::Xoshiro256PlusPlus;

use std::time::Instant;
use std::hash::*;

fn main() {
    let iters = 10000000;

    // let mut hasher = seahash::SeaHasher::new();
    let mut hasher = DefaultHasher::new();

    let mut rng = Xoshiro256PlusPlus::seed_from_u64(42);
    let mut buf = itoa::Buffer::new();

    let now = Instant::now();
    for _ in 0..iters {
        let s = buf.format(rng.gen_range(0..2u32.pow(31)));
        hasher.write(s.as_bytes());
    }
    println!("{:?} {}", now.elapsed(), hasher.finish());
}