rurban / smhasher

Hash function quality and speed tests
https://rurban.github.io/smhasher/
Other
1.86k stars 179 forks source link

FIX: The 8- and 12-bit collision tests are indeed not useful #236

Closed fwojcik closed 2 years ago

fwojcik commented 2 years ago

Example from FNV2 (doc/FNV2.txt):

Keyset 'Sparse' - 32-bit keys with up to 7 bits set - 4514873 keys
Testing collisions ( 64-bit) - Expected    0.0, actual      0 (0.00x)
Testing collisions (high 32-bit) - Expected       2373.0, actual   1021 (0.43x)
Testing collisions (high 25-38 bits) - Worst is 28 bits: 49753/37968 (1.31x)
Testing collisions (high 12-bit) - Expected    4510777.0, actual 4510777 (1.00x)
Testing collisions (high  8-bit) - Expected    4514617.0, actual 4514617 (1.00x)
Testing collisions (low  32-bit) - Expected       2373.0, actual   2123 (0.89x)
Testing collisions (low  25-38 bits) - Worst is 30 bits: 14209/9492 (1.50x)
Testing collisions (low  12-bit) - Expected    4510777.0, actual 4510777 (1.00x)
Testing collisions (low   8-bit) - Expected    4514617.0, actual 4514617 (1.00x)
Testing distribution - Worst bias is the 19-bit window at bit 21 - 96.879% !!!!!

Keyset 'Sparse' - 40-bit keys with up to 6 bits set - 4598479 keys
Testing collisions ( 64-bit) - Expected    0.0, actual      0 (0.00x)
Testing collisions (high 32-bit) - Expected       2461.7, actual   4943 (2.01x) (2482) !!!!!
Testing collisions (high 25-38 bits) - Worst is 32 bits: 4943/2461 (2.01x) !!!!!
Testing collisions (high 12-bit) - Expected    4594383.0, actual 4594383 (1.00x)
Testing collisions (high  8-bit) - Expected    4598223.0, actual 4598223 (1.00x)
Testing collisions (low  32-bit) - Expected       2461.7, actual   1991 (0.81x)
Testing collisions (low  25-38 bits) - Worst is 34 bits: 868/615 (1.41x)
Testing collisions (low  12-bit) - Expected    4594383.0, actual 4594383 (1.00x)
Testing collisions (low   8-bit) - Expected    4598223.0, actual 4598223 (1.00x)
Testing distribution - Worst bias is the 19-bit window at bit 21 - 92.385% !!!!!

Keyset 'Sparse' - 48-bit keys with up to 6 bits set - 14196869 keys
Testing collisions ( 64-bit) - Expected    0.0, actual      0 (0.00x)
Testing collisions (high 32-bit) - Expected      23463.6, actual  34202 (1.46x) (10739)
Testing collisions (high 27-42 bits) - Worst is 41 bits: 267/45 (5.83x) !!!!!
Testing collisions (high 12-bit) - Expected   14192773.0, actual 14192773 (1.00x)
Testing collisions (high  8-bit) - Expected   14196613.0, actual 14196613 (1.00x)
Testing collisions (low  32-bit) - Expected      23463.6, actual  25137 (1.07x) (1674)
Testing collisions (low  27-42 bits) - Worst is 39 bits: 413/183 (2.25x) !!!!!
Testing collisions (low  12-bit) - Expected   14192773.0, actual 14192773 (1.00x)
Testing collisions (low   8-bit) - Expected   14196613.0, actual 14196613 (1.00x)
Testing distribution - Worst bias is the 20-bit window at bit 20 - 77.736% !!!!!

4514873 keys - 2^12 bins == 4510777 "collisions" 4514873 keys - 2^8 bins == 4514617 "collisions"

4598479 keys - 2^12 bins == 4594383 "collisions" 4598479 keys - 2^8 bins == 4598223 "collisions"

14196869 keys - 2^12 bins == 14192773 "collisions" 14196869 keys - 2^8 bins == 14196613 "collisions"

Most hashes have this exact result:

$ egrep 'actual 14192773' doc/* | wc -l
350
rurban commented 2 years ago

Already in the branch fwojcik