ogxd / gxhash

The fastest hashing algorithm 📈
https://docs.rs/gxhash
MIT License
658 stars 22 forks source link

write(&[1u8]) and write_u8(1) yield different hashes #70

Closed ambiso closed 3 weeks ago

ambiso commented 4 weeks ago

Hello,

this is mostly an API question - should these two invocations yield the same hash? I would expect that they would yield the same hash as they do for the DefaultHasher (see here).

use gxhash::*;
use std::hash::Hasher;

fn main() {
    let mut h = GxHasher::with_seed(0);
    h.write_u8(1);
    dbg!(h.finish()); // 7327909443358324775

    let mut h = GxHasher::with_seed(0);
    h.write(&[1u8]);
    dbg!(h.finish()); // 8871263162142091921
}

Kind regards, ambiso

ogxd commented 3 weeks ago

Hello,

This is expected. The reason for this is that for write_u8 (and for some other primitives) we can skip the full construction as we know in advance the size fits entirely in 128-bits, which is the width of the pipe algorithm. This further improves performance in these situations.

Producing the same hashes for [1u8] and 1u8 haven't been considered regarding (claimed) stability, but I understand it can be ambiguous. They're two different types, although they're the same in memory. Maybe it deserves a clarification somewhere in the docs.

As you can see ahash (and maybe other hashers) are doing the same:

use std::hash::{BuildHasher, Hasher};

fn main() {
    dbg!(check(std::hash::RandomState::default())); // true
    dbg!(check(ahash::RandomState::with_seed(42))); // false
    dbg!(check(gxhash::GxBuildHasher::default())); // false
}

fn check<BH>(build_hasher: BH) -> bool
where BH: BuildHasher {
    let mut h1 = build_hasher.build_hasher();
    h1.write_u8(1);

    let mut h2 = build_hasher.build_hasher();
    h2.write(&[1u8]);

    h1.finish() == h2.finish()
}