It's not a real issue , but a question or strange behaviour I can't explain. 🔢 Sorry ;-/
I have used ported SpookyV2 .3 hash32 bit version library to java and can confirm the results in your tests:
for FooXXXXBar, FooBarXXXX, XXXXFooBar with 14776336 keys:
I have about 1x number of collisions as expected,
so roughly around 25389 +/- (so about (0.99x - 1.00x) as in your tests) .
But If I am doing the same test using standard and very very simple java hash code for strings
I have for all 3 cases 0 collisions. Simply no collisions at all!!!
and java hash code calculated in very simple way:
s[0]31^(n-1) + s[1]31^(n-2) + ... + s[n-1]
int hashCode(const std::string& value) {
int h = 0;
if (h == 0 && !value.empty()) {
for (char c : value) {
h = 31 * h + c;
}
}
return h;
}
int hashCode(const char *value) {
int h = 0;
int length = strlen(value);
if (h == 0 && length > 0) {
for (int i = 0; i < length; i++) {
h = 31 * h + value[i];
}
}
return h;
}
for example for
"FooBar0001" is "6FA1E30E" as hex
why such very simple implementation for a string value beats any over complex implementation in terms of collisions?
Many years ago I tried to find an hash code algorithm that would beat this simple implementation in java for text strings in term of collisions and tested about > 20 hash codes implementations, but I didn't find any better alternative ...
It's not a real issue , but a question or strange behaviour I can't explain. 🔢 Sorry ;-/
I have used ported SpookyV2 .3 hash32 bit version library to java and can confirm the results in your tests: for FooXXXXBar, FooBarXXXX, XXXXFooBar with 14776336 keys: I have about 1x number of collisions as expected, so roughly around 25389 +/- (so about (0.99x - 1.00x) as in your tests) . But If I am doing the same test using standard and very very simple java hash code for strings I have for all 3 cases 0 collisions. Simply no collisions at all!!! and java hash code calculated in very simple way: s[0]31^(n-1) + s[1]31^(n-2) + ... + s[n-1]
for example for "FooBar0001" is "6FA1E30E" as hex why such very simple implementation for a string value beats any over complex implementation in terms of collisions? Many years ago I tried to find an hash code algorithm that would beat this simple implementation in java for text strings in term of collisions and tested about > 20 hash codes implementations, but I didn't find any better alternative ...