skeeto / hash-prospector

Automated integer hash function discovery
The Unlicense
672 stars 27 forks source link

About the avalanche bias #28

Open jonmaiga opened 9 months ago

jonmaiga commented 9 months ago

First I think this is a really awesome project! I stumbled across it some years ago when I was playing with a similar thing (automatically find good mixers) and I remember it as very inspiring!

So I found the lists of mixers (in the readme+an issue) and I though it would be a good correlation test to run through my Tests for Randomness (TFR) and PractRand, both using Pelle Evensen's harsh RRC-scheme to scramble the input bits.

I ran through +100 of the prospector mixers (love the plain text format plus 32-bit mixers are so quick to evaluate even with RRC), and I thought it might be interesting to share the results here.

What may come as a surprise is that the original 32-bit prospector (from the article) seems to performing very well. The original is the only one that goes to 2^18 in TFR the second runner up fails at 2^13 of all tested prospector mixers.

This leads me to the question if something in the search code of prospector changed since the original was found?

RRC-PractRand TFR Construction Alias Comment
14 12 16 85ebca6b 13 c2b2ae35 16 = ? murmur3 For reference
16 18 15 2c1b3c6d 12 297a2d39 15 = 0.34968228323361017 original From the prospector article
14 10 16 21f0aaad 15 d35a2d97 15 = 0.10760229515479501 low_bias The one with lowest bias (by TheIronBorn)

For reference, the furthest I've ever seen a xmxmx 32-bit mixer go in RRC-PractRand is to 2^18 and in TFR to 2^19.

Now PractRand is a bit of a blackbox to me, but since I am very familiar with TFR I investigated how the low_bias one fails. If you run it through TFR it will say something like:

==========================================================================================
mix32::prospector_low_bias 2^10 with 128 samples using 32-bits (0s)
bic,coupon,divisibility,gap,mean,permutation,runs,sac,uniform
==========================================================================================
test       p-value  remark       stream/description
uniform-m  0        failure(10)  mix32::prospector_low_bias(reverse_co-31(counter-1))

***FAILED*** at 2^10 with a total of 1 failures and 0 suspicious results.

In the uniform test we expect a uniform distribution but the statistics says that this is very likely not the case. The mix32::prospector_low_bias(reverse_co-31(counter-1)) indicates the used test sample (out of 128). This is a unitary counter has reversed bits + complement and rotated 31 bits (as per RRC). If we were to recreate this in code it would look something like this in c++ (copy pasted a bit so pardon the overkill with templates etc):

#include <iostream>
#include <vector>

template <typename T>
constexpr int bit_sizeof() {
    return 8 * sizeof(T);
}

template <typename T>
constexpr T ror(T v, int r) {
    constexpr auto Bits = bit_sizeof<T>();
    return (v << r) | (v >> (Bits - r));
}

constexpr uint32_t reverse_bits(uint32_t x) {
    x = ((x >> 1) & 0x55555555) | ((x & 0x55555555) << 1);
    x = ((x >> 2) & 0x33333333) | ((x & 0x33333333) << 2);
    x = ((x >> 4) & 0x0F0F0F0F) | ((x & 0x0F0F0F0F) << 4);
    x = ((x >> 8) & 0x00FF00FF) | ((x & 0x00FF00FF) << 8);
    return (x >> 16) | (x << 16);
}

int main() {
    std::vector<uint32_t> ys;
    for (uint32_t i = 1; i <= 1024; ++i) {
        auto x = ror(~reverse_bits(i), 31);
        x ^= x >> 16;
        x *= 0x21f0aaad;
        x ^= x >> 15;
        x *= 0xd35a2d97;
        x ^= x >> 15;
        ys.push_back(x);
    }

    for (const auto y : ys) {
        std::cout << y << ",";
    }
}

This will output 2^10 values: 2731428922,3383114888,3653537803,1575712730,... now if we examine those with chi2 statistics in Mathematica we get:

PearsonChiSquareTest[low_bias_values, UniformDistribution[{0, 2^32-1}]] this outputs a p-value close to zero (2.23e-22).

Doing the same for the original mixer gives a p-value of 0.11 which means indistinguishable from uniform distribution. It's possible to see this with the naked eye, here are histograms of the output for the same rrc-stream from the lowbias, the original and trng (true random data):

image

Here it looks like the lowest bias version is not necessarily the best according to RRC-PractRand and TFR (see attached logs for many more examples). Maybe the bias calculated in prospector is an insufficient metric?

On a positive note, I strongly believe avalanche in some form is a sufficient test. In TFR, both sac and bic are very strong indicators, while most others (uniform, gap, coupons etc ..) statistics will fail early or not at all. So maybe only some minor changes could help.

I've attached two text files with logs for all prospector mixers I tested (tfr+practrand). The first column is how far the mixer came in the tests:

tfr_prospector.txt pr_prospector.txt

This is more of a thought, if this seems interesting/relevant we can dig deeper, keep up the good work!

TheIronBorn commented 9 months ago

Others have pointed out issues with this project's metric (https://github.com/skeeto/hash-prospector/issues/21) though I'm not sure if that's common to all avalanche metrics. Note that this project uses Root Mean Square Relative Error, while TFR seems to use a Chi-Squared test

jonmaiga commented 9 months ago

Oh interesting, I didn't see that issue! Yes cycles will appear and with a mere 32-bit state I guess the probability makes it likely in practice.

For mixing or hashing arbitrary data I'm not sure how/if the cycles affects the result? In the case of using it as an PRNG I can see how (at least in a naïve implementation (state=mix(state)) some seeds should definitely be avoided. But maybe minimizing short cycles will lead to better mixing/hashing too? I haven't thought of that.

The chi2 test will only be able to say pass/fail (p-values can't be used as a guiding metric). The way I have done it is to test the avalanche (pass/fail) at intervals (perhaps at power of two) and use how much data before failure as primary metric (then I add some other statistics to get more fine grained).

But yeah, there might be better/more efficient ways.

TheIronBorn commented 9 months ago

I wonder if something like Testing for goodness rather than lack of fit of continuous probability distributions would be more useful

jonmaiga commented 9 months ago

Yes maybe, it sounds like equivalence testing (new to me) possibly would fail the test earlier than traditional testing. But tbh I've no intuition about this.