ogxd / gxhash

The fastest hashing algorithm 📈
https://docs.rs/gxhash
MIT License
798 stars 27 forks source link

Why gxhash::gxhash128 result not same as gxhash::GxHasher #94

Closed xpurer closed 2 weeks ago

xpurer commented 3 months ago

CleanShot 2024-08-01 at 11 57 44

    let bin = [];

    dbg!(gxhash::gxhash128(&bin, 0));
    let mut hasher = gxhash::GxHasher::with_seed(0);
    hasher.write(&bin);
    dbg!(hasher.finish_u128());

[xhash/tests/main.rs:14:3] gxhash::gxhash128(&bin, 0) = 302767221070957831171542222971961600063 [xhash/tests/main.rs:17:3] hasher.finish_u128() = 78505093061913940866771591142410949548

ogxd commented 3 months ago

The Hasher trait defines methods to hash specific types. This allows the implementation to circumvent some tricks used when the size is unknown. For this reason, hashing 4 u32 using a Hasher will return a different hash compared to using the gxhash128 method directly with these same 4 u32 but represented as 16 u8. The rationale being that Hasher (mostly used for things like HashMap or HashSet) and gxhash128 are used in two different scenarios. Both way are independently stable still.

Now you may ask why Hashser::write(&mut self, bytes: &[u8]) differs from gxhash128 since in both case the input size is of any size. The reason is simply because there is a special handling for known types like I explained above. We either want both to work exactly the same (at the cost of performance) or not at all. We chose the latter.

This is open for discussion

xpurer commented 3 months ago

Can you provide a Hasher which write(&mut self, bytes: &[u8]) that is the same as gxhash128? For example, GxStreamHasher Because if I want to calculate the hash of a large file, it is more convenient to calculate it in batches using the stream method, but I hope its output is consistent with gxhash128

ogxd commented 2 weeks ago

This should be fairly easy to do. Unfortunately, I don't have much time to do this myself for you in my spare time (kids, you know). I might consider it if there is more demand for it.