rurban / smhasher

Hash function quality and speed tests
https://rurban.github.io/smhasher/
Other
1.81k stars 176 forks source link

Idea for a new test: how "wide" can a hash go #120

Open dumblob opened 4 years ago

dumblob commented 4 years ago

I've stolen the term "wide" from the http://www.reedbeta.com/blog/quick-and-easy-gpu-random-numbers-in-d3d11/ and I quote:

Incidentally, I was confused for a long time about the distinction between PRNGs and hash functions—they seem to do the same thing, i.e. create an “unpredictable” but deterministic output by jumbling up input bits. It wasn’t until I started playing around with GPU PRNGs that I realized the difference: PRNGs are designed for going deep, and hashes for going wide.

Could we construct a test showing how "wide" the hash goes? Such test could also be named "how prone the hash is to create correlation patterns".

It's probably not that much important for hash quality in general, but maybe it might find some important use in the future once it'll be measured :wink:.

peteroupc commented 4 years ago

The article you linked to compares two PRNGs:

Both kinds of PRNGs are linear, so they tend to produce correlated sequences of numbers across nearby seeds. Nonlinear PRNGs as well as nonlinear hash functions are less likely to do so. Generally, two PRNGs (e.g., a PRNG initialized with one seed and the same PRNG initialized with a nearby seed) can be tested for correlation by interleaving their outputs. Something similar can be done for hash functions. See also my notes on testing PRNGs.

However, for the use in hash tables and other data lookup applications, hash functions are designed to map data into a sequence of bits in a way that avoids collisions ("better than random").

(Note: Despite what the article states, LCGs are not always "bad news"; at larger sizes and when their lower bits are chopped off, LCGs can have exceptional randomness quality. Meanwhile, Xorshift generators by themselves fail linear complexity tests, but this is not much of an issue for casual games, which is the focus of the article.)

o0101 commented 4 years ago

In this case, there's probably already a test (or two) in SMHasher that measures "wideness" described in the article,

set up a lot of independent PRNG instances with different seeds so we can draw from each of them in parallel

(read hash for PRNG, in the above),

being the Seed test and MomentChi2 tests, which I think test some sort of correlation on seeded and non-seeded runs on the same key, and also on nearby seeds on the same key. Or at least they used to (MomentChi2 I think got updated in the last few months, tho maybe that was only a code re-write).