skeeto / hash-prospector

Automated integer hash function discovery
The Unlicense
672 stars 27 forks source link

Normalize the score #18

Open account-login opened 2 years ago

account-login commented 2 years ago

I tweaked the score function as follow:

    double sum = 0;
    for (int j = 0; j < nbits; j++) {
        for (int k = 0; k < nbits; k++) {
            double diff = bins[j][k] - n / 2.0;
            sum += diff * diff;
        }
    }
    return 2 * sqrt(sum / n) / nbits;

Now the scores of good hash functions, for both 32 bit and 64 bit, different -q values, are around 1.0.

I don't understand the statistics, but I have a guess on how the score function works. The score function is equivalent to the fair coin test problem. You have 32x32 coins, perform coin flips, and yield 32x32 frequencies, these frequencies deviate from 0.5, compare the variance of frequency to the expected variance of fair coins. And the score can be less than 1.0 due to the variance of variance.

What I don't expect is the exact score is significantly larger than the estimated score. Many xorr,mul sequences yield an estimated score around 1.0, but the exact score of lowbias32 is 11.4. The lowest exact score I tested is two AES round, yielding an exact score of 1.4.