Open BraveLandLin opened 1 year ago
hi @BraveLandLin have you run this test against other implementations? I am curious to know the results.
I certainly did. For example, I tested the implementation on 'https://en.wikipedia.org/wiki/MurmurHash,' and they all produced similar results. Therefore, the deviation may not be due to your implementation; it could be inherent to the algorithm itself. I'm just curious as to why the algorithm exhibits such a significant deviation
What OS are you using (
uname -a
, or Windows version)?Linux hzscn008 5.10.27-051027-generic #202103310028 SMP Thu Apr 1 02:16:48 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
What programming language are you using (C/C++/Go/Rust)?
C++
g++ --version g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
What did you expect to see and what you saw instead?
I've tested your murmurhash.c, which implements the Murmur Hash algorithm. My testing approach involves converting IPv4 addresses to 32-bit integers. After applying the hash function and modulo operator, I place them into separate buckets. Subsequently, I perform statistical analysis on the bucket counts and check for deviations. Here's my test code,save it as test_murmur.cpp,
To compile:
To run:
Since the Murmur algorithm has passed chi-squared and avalanche tests, I initially assumed that the counts of each bucket would be almost the same, resulting in a deviation close to zero. However, as the total number of buckets increases, the deviation becomes significantly larger. For instance, when the number of buckets is set to 512, the deviation is as follows:
Positive deviation: 18.9744% Negative deviation: -21.0256%
Do you have any insights on my test results? I don't believe the deviation is acceptable. What do you think? Thanks in advance .