Sparse test fails 7.2% of ideal hashes.

NoHatCoder commented 4 years ago

Keyset 'Sparse' - 16-bit keys with up to 9 bits set - 50643 keys (high 32-bit) and (low 32-bit), reports as failed if either has 2 or more collisions. For an ideal hash this will trigger failure with 7.2% probability, that seems like a tad bit too much.

At a glance blake2b-256 and MeowHash are marked with no other failures.

o0101 commented 4 years ago

In my experience you can fix this by mixing in the key more times, preferably with some modification, like shifting it.

That's how I passed sparse when it was the only thing failing, on multiple different hashes.

NoHatCoder commented 4 years ago

That is just making a random change to your hash function. Every time you make such a change you re-roll with a 7.2% chance of failure, but you are not necessarily improving your function, you are just making it pass an arbitrary test.

o0101 commented 4 years ago

I wondered about that.

Still, the point is to pass these tests, which would seem to make them not arbitrary.

If what you say is true, I guess my changes improved my % chance of failure from X > 7.2 to 7.2

Or maybe I don't understand this. 🤷‍♂

How did you calculate the 7.2%?

o0101 commented 4 years ago

@NoHatCoder, BTW the stated change is not "random." Another round while shifting the key is a deliberate way to directly address the sparse test, another is adding some constants to the key to make it less sparse.

That's if you want to pass the tests. If you want to just change the tests, you've got a different problem, I guess.

Did you test all hashes to arrive at 7.2%? Is it possible this could be (wrt this test) simply a design flaw in otherwise ok hashes?

If that's the case, probably can fix the hashes, as I was suggesting.

Best of luck with this! 😃

NoHatCoder commented 4 years ago

There are 50643 hash values included in the set, that gives 50643 (50643 - 1) / 2 = 1282331403 distinct pairs. Each of those pair ideally have a 2^-32 chance of colliding, so you get 1282331403 2^-32 = 0.298566 expected collisions. Calculating the Poisson distribution for that expectation gives a chance of 0.741881 for 0 occurrences, 0.221501 for 1 occurrence, and thus 0.036618 for two or more occurrences. Two such tests are performed, that gives a combined probability of at least one of them failing of 0.071895.

o0101 commented 4 years ago

OK I follow up to 0.298566 but I don't know why you need to use the Poisson distribution.

To me it seems like 0.298566 (~ 0) collisions is the expected value for an ideal hash.

Also, sorry I know this is a digression and it's not up to you to explain it to someone who doesn't understand.

NoHatCoder commented 4 years ago

This is basic statistics.

It comes out approximately like this: Every time you design a hash function, roll two dice twice, if either roll is two sixes, your function fails this test.

When we roll two dice the expected number of sixes is 1/3, so indeed we get 0 sixes most of the time, but sometimes we do get two sixes.

Declaring a hash function broken based on this test is like declaring a pair of dice loaded because you rolled two sixes.

If you actually want to know whether or not a die is loaded, the best you can do is to roll it a lot of times, note the results, and then do the statistics to determine if any roll comes out too often to fit the ideal die model with reasonable probability. In general, the more tests, the less loaded the die has to be in order for the skew to be notable.

A statistical hash test suite ought to work the same way, and for a quite limited test like this you probably shouldn't report a problem for less than somewhere around 5 collisions (probability: 29 in a million).

o0101 commented 4 years ago

@NoHatCoder great explanation. Thank you.

o0101 commented 4 years ago

@NoHatCoder but I still have no idea what you're talking about with the dice.

So from the dice example, E(2 sixes) = 1/3

So if you make 100 such roles, you expect to get ~ 33 events of two sixes. Is that right?

Switching to the hash example, E(sparse test collision) = ~ 1/3

So if you make 100 ideal hashes, you expect to get ~ 30 collision events.

You're saying probability of getting a collision in this test is too high to make this test a useful signal of failure, is that right?

You're also saying that probability is 7.2%, is that right?

If that's the case, I think I get your position, I'm just wondering about it with regards to the following.

For the 7.2% claim, I thought the test fails with 1 collision so, in that case it would fail 30% of the time. But even if it fails with 2 or more collisions, I don't get the need to use Poisson distribution. Can you not say that E(c) = ~ 0.3 so E(2c) = ~ 0.09 ?

For the first claim, perhaps the idea behind this test is that sparse keys represent some greater vulnerability by being comparatively easier to search for, so hash functions, in order to be more secure, need to have some special case handling of sparse keys to prevent collision.

In that case, if you handle sparse keys like I suggest (and how I got my hashes to reliably pass these tests), I believe (tho I'm not certain) you're able to better handle the potential vulnerability that sparse keys represent.

I'm not saying the sparse keyset tests are perfect, just wondering what do you think about this?

NoHatCoder commented 4 years ago

Can you not say that E(c) = ~ 0.3 so E(2c) = ~ 0.09 ?

No, it is not that simple, that is why we need the Poisson distribution. Long story short, given certain criteria, the Poisson distribution lets us convert the expected average into the probability of a certain number of events happening, you can't simply multiply the numbers.

For the first claim, perhaps the idea behind this test is that sparse keys represent some greater vulnerability by being comparatively easier to search for, so hash functions, in order to be more secure, need to have some special case handling of sparse keys to prevent collision.

No, the idea is that some hash functions don't do proper avalanching, and the sparse test will tend to break those. But it still produces collisions at random, like any other test.

If you want to see the statistics in action, you could try running the specific test (Sparse 16-bit keys) on Beamsplitter a thousand times using a thousand different seeds.

o0101 commented 4 years ago

Thanks, Jacob for your time explaining this to me.

Because of your idea, I intend to run Beamsplitter (and maybe the others) a 1000 times with different seeds on the Sparse 16 test to see the statistics in action.

On Thu, Apr 2, 2020, 2:36 PM Jacob Christian Munch-Andersen < notifications@github.com> wrote:

Can you not say that E(c) = ~ 0.3 so E(2c) = ~ 0.09 ?

No, it is not that simple, that is why we need the Poisson distribution. Long story short, given certain criteria, the Poisson distribution lets us convert the expected average into the probability of a certain number of events happening, you can't simply multiply the numbers.

For the first claim, perhaps the idea behind this test is that sparse keys represent some greater vulnerability by being comparatively easier to search for, so hash functions, in order to be more secure, need to have some special case handling of sparse keys to prevent collision.

No, the idea is that some hash functions don't do proper avalanching, and the sparse test will tend to break those. But it still produces collisions at random, like any other test.

If you want to see the statistics in action, you could try running the specific test (Sparse 16-bit keys) on Beamsplitter a thousand times using a thousand different seeds.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rurban/smhasher/issues/114#issuecomment-607650708, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJZFG6M6REEUFA4XCM546DRKQW6DANCNFSM4LRLMSZQ .

rurban commented 4 years ago

I found and fixed a basic birthday calculation error, and updated all tests. Esp. the Sparse 32-bit test was affected. The estimation was off by factor 2. cd087ddfd293fc6e1bc6e55d3ca53793cd5569ec

orlp commented 1 year ago

This isn't fixed. I'm still getting sporadic failures with 2 collisions on a universal hash function:

Keyset 'Sparse' - 16-bit keys with up to 9 bits set - 50643 keys
Testing collisions ( 64-bit) - Expected    0.0, actual      0 (0.00x)
Testing collisions (high 32-bit) - Expected          0.3, actual      0 (0.00x)
Testing collisions (high 19-25 bits) - Worst is 24 bits: 78/76 (1.02x)
Testing collisions (low  32-bit) - Expected          0.3, actual      2 (6.70x) (2) !!!!!
Testing collisions (low  19-25 bits) - Worst is 23 bits: 169/152 (1.11x)
Testing distribution - Worst bias is the 13-bit window at bit 55 - 0.605%

This is not reasonable. Let's assume we have a perfect random oracle that outputs a completely uniform and random 32-bit value for each input. We are hashing 50643 inputs. I can simulate this by using a cryptographically secure PRNG in Rust by generating 50643 random indices and filling those:

use rand::prelude::*;
use rand_chacha::ChaCha20Rng;

fn sim_collisions<R: Rng>(w: usize, k: usize, rng: &mut R) -> usize {
    let mut table = vec![0u8; 1 << w];
    let mut collisions = 0usize;
    let mask = (1 << w) - 1;
    for _ in 0..k {
        let i = rng.gen::<usize>() & mask;
        collisions += table[i] as usize;
        table[i] = 1;
    }
    return collisions;
}

fn main() {
    let mut rng = ChaCha20Rng::seed_from_u64(0xdeadbeef);
    let k = 50643;
    let mut distribution = vec![0; k+1];
    for _ in 0..1000 {
        distribution[sim_collisions(32, k, &mut rng)] += 1;
    }
    for c in 0..=k {
        if distribution[c] > 0 {
            println!("{c} collisions: {}", distribution[c]);
        }
    }
}

Here are the results:

0 collisions: 719
1 collisions: 241
2 collisions: 38
3 collisions: 2

That's right, even an absolutely perfect random oracle fails this test ~40/1000 times. And you repeat this test for both the upper and lower 32-bits, so that's a 1 - (1 - 0.04)^2 = ~7.8% chance of failing this test, even if you're literally a perfect random oracle!

orlp commented 1 year ago

In case you don't trust ChaCha20 I repeated the experiment with AES as the CSPRNG and got the following results:

0 collisions: 732
1 collisions: 229
2 collisions: 35
3 collisions: 4

avaneev commented 1 year ago

I also pointed to this bug a long time ago. Reini agreed to use ceil(collrate*4) as a "pass" condition (2), but seems never introduced it. Getting 2 collisions in this test is perfectly correct. There's also an unsolved issue with Sparse 16-bit bias for 1024-bit hashes-sometimes the bias can go a bit over 1.0%, with 0.9% being a usual value. The 16-bit sparse test statistics weren't scaled well for all hash lengths.

NoHatCoder commented 1 year ago

Well, a fixed multiplier is really not the correct solution. As is there is these tests that produce a lot of false flags, and then a bunch that fail to flag significant deviations. Again, you need the Poisson distribution to set reasonable limits.

avaneev commented 1 year ago

I agree that Poisson distribution may be a good match. 3.3% probability of an event of 2 collisions is not over the place.

rurban commented 1 year ago

PR please. I'm too busy

NoHatCoder commented 1 year ago

OK, I can't build this rep, and I don't think I'll get there without way too much work. So I'm not going to produce a ready-made patch. Also the code in Stats.h looks pretty battle-scarred.

What I can do is make a p-value calculator that can easily be integrated by someone else.

NoHatCoder commented 1 year ago

This got a bit more complicated than I thought it would, but I think I have managed to deal with pretty much every eventuality.

Here is the code: https://github.com/NoHatCoder/pValue_for_smhasher

The code is in pvalue.c, pvalue_junk.c has some test code and discarded stuff.

The Poisson distribution turns into a bad approximation in cases where there is a lot of collisions, in particular in the countPairs=false mode. So I have added in an exact calculation for that, which potentially require a lot of computation, so there is also a cache of results in order to limit how much it runs.

Then there is a coin flip function specifically for the single bit tests.

The p-values returned by these functions are specifically to interpreted as such: A perfect hash function has x probability of producing a p-value of x or lower. This only holds as far as x is a value that can be produced in the context.

The coin flip function technically return the lower of 2 p-values.

As for interpreting the numbers I'd recommend a light flag-for-further-study at values of 1/10000 and lower, and a heavy flag at 1/1000000.

avaneev commented 1 year ago

Hm. I honestly think that "hashing singularity" has been reached. We have bloated hashes like xxhash as well as more elegant hashes like ahash, wyhash and komihash. Math has objective limits relative to "minimal" working hashing constructs. It just can't be made simpler and thus more efficient further.

rurban / smhasher

Sparse test fails 7.2% of ideal hashes. #114