Modular reduction for finalization

jan-wassenberg commented 7 years ago

HighwayHash has 1024 state bits which are reduced to 256 bits via 3 additions. Most callers simply retain the lower 64 bits and discard the others. For additional security, it would be useful to include the additional state bits in the output.

It has been suggested to use AES to scramble the state during finalization. However, the nature of AES is comparable to what HighwayHash is already doing, so another commonly used (by cryptographic hashes) approach might be better: modular reduction. To avoid costly divisions, we could use Mersenne primes (notably 2^127 - 1) for faster reduction. The 2nd level of VHASH does something similar. This might lead to more security and perhaps enable a cryptographic hash function with say 256 bits (using reductions instead of adding).

Bulat-Ziganshin commented 7 years ago

i always wonder what you mean by cryptographic hashes? sha-1? or PRF? if the first, existing cryptographic hashes performs an order of magnitude slower, so cryptohash of such speed will be a fantastic breakthrough. but, sorry, you are not famous cryptographers, nor any famous cryptographers ever tried to analyze your hash, so it seems that there are no serious proofs of its crypto-strength. if I, Bulat Ziganshin, can't break my cipher/hash, it doesn't say anything about my cipher/hash as far as i'm not a cryptographer.

btw, do you read my note at https://en.wikipedia.org/wiki/Talk:HighwayHash ? i think that this sort of false self-advertizement is exactly why wikipedia doesn't like articles wriiten by authors themself.

PS: chances that you can build cryptohash 10x faster than any existing ones, are close to 0, since otherwise it will mean that all the cryptographers around can't do their job. Look for example at main cycle of Blake2 - it performs multiple passes of similar algorithm just to mix data enough. You perfrom only one pass of mixing - it's enough for usual hashing, may be enough for PRF, but way too low compared to existing cryptohashes.

PPS: another well-known polynom: https://en.wikipedia.org/wiki/Poly1305#Description

jan-wassenberg commented 7 years ago

Fully agree, we are not cryptographers and well aware of that fact. Note that the README is very careful to say only "HighwayHash [..] may inspire new cryptographically strong hashes", and that there is no theory yet for provable security (beyond our own attack attempts): "New cryptanalysis tools may still need to be developed for further analysis".

We are aware of (and use) Poly1305; good security, but considerably slower than HighwayHash. I am not qualified for (nor interested in) an attempt to compete against SHA-[1-3]; we're generally thinking about PRF and perhaps MAC. The idea/hope is to replace definitely insecure hash functions with something that at least strives for (MAC-level) security, without the 10x cost of actual cryptographic hashes which are not needed in most applications.

Thanks for pointing me to your comment on Wikipedia, I hadn't seen it previously. I welcome you to point out how these or other statements are false. We value integrity and made sure to log in and disclose authorship.

jyrkialakuijala commented 7 years ago

Mixing is an art I think. HighwayHash mixing is based on two principles: to get maximum entropy (entropy because my brains are overly compression-oriented) at lowest amount of computation with the instructions we have at our disposal.

The kind of mixing we do should be roughly as effective as 2.5+ rounds of more conventional mixing (guesswork estimate). Mixing function G in RFC 7693 looks much less thorough at a quick look than the mixing in highwayhash. I'm thinking of visualizing the effectiveness of mixing to make different mixing strategies comparable, one where the probability of a bit location in a state bit is calculated and the entropy of mixing is calculated -- and another where bit locations are "paints" and the visualization of paint mixing is shown as the mixing function advances and is repeated.

Do you have ideas how to compare mixing functions best? Or is the smhasher-like statistical approach still the best possible way with this?

lemire commented 7 years ago

@jan-wassenberg If you are going to go with modular reduction and you are using SIMD instructions anyhow, why not go with a GF2[x]/p(x) modular reduction like CLHash does? https://github.com/lemire/clhash

It is likely going to be much faster.

Bulat-Ziganshin commented 7 years ago

@lemire, what's your exact proposal?

lemire commented 7 years ago

@Bulat-Ziganshin

Well, for example, clhash relies on a 256-bit to 128-bit modulo operation....

https://github.com/lemire/clhash/blob/master/src/clhash.c#L22

We use the fact that 2^127 + 2 + 1 is an irreducible polynomial.

And then you can further reduce to a 64-bit word:

https://github.com/lemire/clhash/blob/master/src/clhash.c#L111-L123

In total, you use no more than 2 multiplications.

jan-wassenberg commented 7 years ago

Just a quick update, we're actively working on this. The 2 CLMUL are slower than a HH round despite mixing less, but there is hope for a simpler reduction (in addition to, rather than replacing, the HH rounds).

jan-wassenberg commented 7 years ago

We've successfully applied Lemire's lazymod127 approach to 256-bit hashes, the statistical tests show a benefit there. Thanks for suggesting it! This is faster than CLMUL, with only 12 instructions because the reduction polynomial has 3 terms. We can save one instruction with a new variant of the 128-bit shift inside.

For 64 and 128-bit hashes, we're not seeing a benefit (actually higher average bias), so we can retain the current finalization there, but with an improved length padding scheme.

Bulat-Ziganshin commented 7 years ago

are you taked into account that mod127 returns 126-bit result?

jan-wassenberg commented 7 years ago

It's a bit tricky.. the input is assumed to be 126 bit (which we ensure by masking), but the output x' is a full 128-bit number that is congruent to x (mod the 2^127 polynomial). Please see https://arxiv.org/pdf/1503.03465v8.pdf, page 10.

Another way to look at the reduction is that it is very similar to an XOR-fold, but with two shifted versions of the high part. To get a 256-bit result, we concatenate two such independent 128-bit reductions.

Bulat-Ziganshin commented 7 years ago

the statistical tests show a benefit there. Thanks for suggesting it!

Don't rely on avalanche stats, they are just random numbers that randomly change on each change of the algorithm. You don't compute confidence interval, but from results already published i guess that 1-2% difference is just a random noise.

but the output x' is a full 128-bit number that is congruent to x (mod the 2^127 polynomial).

While it has full 128 bits, it carries only 127 bits of entropy (i was wrong saying that it's only 126 bits)

Note that clhash is 64-bit hash, so using 127-bit intermediate result is ok for their purpose, but not for your case. Moreover, they say about (n−1)/2^126 collision probablility, although i don't understand why (n-1) and why 2^126 rather than 2^127. Probably other hash reduction techniques we are using have similar collision rates.

lemire commented 7 years ago

Don't rely on avalanche stats, they are just random numbers that randomly change on each change of the algorithm.

If we are referring to smhasher, I think that the avalanche tests are deterministic. There is a pseudo-random number, but with a fixed seed.

It is true however that testing the avalanche effect only on (pseudo) random inputs is weak. It seems that you'd want to also test with "clustered" inputs. That is, inputs that have a small Hamming distance.

Bulat-Ziganshin commented 7 years ago

@lemire, do you read their paper ? Pages 9-10 describes their use of smhasher to check which hash is better. In short, they use min/avg/max avalanche stats to decide which hash is better. But these results are pseudorandom, some number will be always smaller than another, and this difference tells nothing without statistical analysis. An example of their conclusions is:

We've successfully applied Lemire's lazymod127 approach to 256-bit hashes, the statistical tests show a benefit there. For 64 and 128-bit hashes, we're not seeing a benefit (actually higher average bias)

Using different seeds, they can made exactly opposite conclusion :)

That is, inputs that have a small Hamming distance.

That's the most of smhasher tests. And they all compute "worst bias". Anyway, my experience tells that any reasonable hash is indistiguishable from sha1 by means of smasher testing. Moreover, xxh32 successfully passes ALL "worst bias" tests including the avalanche test itself, but fails ALL expected/actual tests (by "fails" i mean that results are clearly worse than those for sha1). This essentially means than avalanche stats is most meaningless mark of hash quality, even from the poor smhasher arsenal.

While it has full 128 bits, it carries only 127 bits of entropy

sorry, i was incorrect about that

jan-wassenberg commented 7 years ago

If we are referring to smhasher, I think that the avalanche tests are deterministic. There is a pseudo-random number, but with a fixed seed.

Yes, it is deterministic. I agree with Bulat, though - there could be some unlucky interaction between the hash state (arising from the hardcoded key) and the inputs (governed by PCG seed). Max is prone to outliers, so perhaps it would be better to have 1/10th as many iterations, but repeat each test 9 times and report the median. Does that sound reasonable?

This will take quite some time to (re-)run, but it may make a difference because the observed effect size was small.

FYI we have some idea of the noise floor from running the same test on RDRAND masquerading as a hash function and will add mention of that to the paper.

While it has full 128 bits, it carries only 127 bits of entropy

No worries, the avalanche test would have detected if one of the bits is badly distributed (i.e. does not flip half the time).

jan-wassenberg commented 7 years ago

my experience tells that any reasonable hash is indistiguishable from sha1 by means of smasher testing

With the default settings, yes - but higher-quality RNG and 300x more iterations and we start seeing differences.

We're now burning lots of CPU re-running tests with the median (of 11) suggestion. Looking forward to seeing results early next week!

Bulat-Ziganshin commented 7 years ago

With the default settings, yes - but higher-quality RNG and 300x more iterations and we start seeing differences.

And that's the difference? I got close results for xxh32 and sha1. You got close results for 4 hashes you've tested. In my tests some hashes has better max value, but worser avg, others were on opposite side. You got the same results. The only difference is that we got to different conclusions, but you can make similar conclusion for any 4 randomly picked hashes - at least one of them will lose all 3 disciplines (min, max and avg). And i bet that if it was your hash, you will just use find some other criteria to declare your hash as winner.

With default smhasher settings, worst bias for 128-bit hashes was usually in 0.8-0.9% area, i.e. 10% relative difference between similar quality hashes. In your test, relative difference between worst biases of different hashes is also about 10%. So, results became smaller in absolute values, but relative differences are the same. As far as i see, all your changes had no effect on testing quality. In fact, it was even worsened since you don't had enough CPU time to test other hashes or make multiple tests of your own hash.

We're now burning lots of CPU re-running tests with the median (of 11) suggestion. Looking forward to seeing results early next week!

the same error here. you need to perfrom statistical analysis rather than compare single pseudo-random numbers - some number always will be smaller than other. instead of burning cpu cycles, try f.e. 100 runs of 300K hashes each - for your hash, siphash, any cryptohash and your competition (murmur3c, spooky)

jan-wassenberg commented 7 years ago

I got close results for xxh32 and sha1.

I understand you doubt the value of a test when it always returns "ok", but you are testing hashes that are all good. ARX constructs such as Murmur/XXH are known to avalanche well.

As far as i see, all your changes had no effect on testing quality.

This test detected huge biases with FarmHash Fingerprint64 (> 4000x higher than the other hashes) which had not been reported previously.

Do you have suggestions for stronger tests? It would be interesting to generate a very large amount of hashes and check for collisions (might be able to sort XX TB, or perhaps resort to bloom filters).

And i bet that if it was your hash, you will just use find some other criteria to declare your hash as winner.

That's disappointing, but note that our paper does not declare any of the tested hashes a winner; each has slightly better results in one variable. Min/max are much more prone to outliers, so those are the least reliable. We aren't even trying to demonstrate superiority, just that this new multiply/permute scheme has mixing and avalanching comparable to start of the art ARX (Blake2, the similar Chacha, SipHash).

In fact, it was even worsened since you don't had enough CPU time to test other hashes or make multiple tests of your own hash.

CPU time is plentiful; we’ve already spent XX000 CPU hours.

and your competition (murmur3c, spooky)

We do not view Murmur3c and XXH64 as competition because their multiply-by-constant operations are reversible, which allows finding seed-independent collisions. Spooky looks considerably better, but also doesn’t come with security claims and is slightly slower.

Bulat-Ziganshin commented 7 years ago

Spooky employs only ARX operations, while murmur/xxh additionally employs mutiplication by even constant, and highway employs ARX plus 32*32->64 multiplication and byte shuffling. All these operations are reversible!

murmur3a and xxh32 fails because they use reversible 32-bit operations on 32-bit input words handling 32-bit states, so update effect of two sequentially processed words can easily revert each other (and xxh64 does the same with 64-bit words/operations/states). OTOH smarter hashes use wider state to mix in input bits prior to going to the next input word

Examples of such hashes are highway, spooky, murmur3c and murmur3f (note that murmur3 family includes 3 members - 3a, 3c and 3f). So, while your mixing operation is novel, i don't see why it should be any better than enough amount of mixing done by other hashes with other operations. It will be even more exiting to check simplest hash that still passes smhasher and doesn't have problems of xxh32/mur3a - i mean my own zzh, but unfortunately, it wasn't finished :)

Spooky looks considerably better, but also doesn’t come with security claims and is slightly slower.

Of the top of my head, spooky is 4 bytes/cycle that is faster than your hash. That said, you tend to operate with GB/s metric that is disappointing. Correct metric is cpb (cycles per bytes) or bpc (bytes per cycles), accompanied by exact CPU architecture (and compiler/switches)

About security claims - i haven't seen any examples of breaking hashes other than mur3a/xxh. Everything that you have done in your paper, may be applied to Spooky/mur3c/mur3f/zzh - and probably with just the same conclusions. If you think that lack of such attempts differentiate highway from other hashes - ok, we disagree here :)

This test detected huge biases with FarmHash Fingerprint64 (> 4000x higher than the other hashes) which had not been reported previously.

It's because it was impossible to detect with regular smhasher testing or because no one even bothered? I have found problems in xxh32/mur3a that isn't reported on their official pages, just by careful looking into results of regular smhasher testing. At the same time, they greatly pass the avalanche test. So, it will be interesting to see regular smhasher tests of Fingerprint64 and especially of its 32-bit halves (most hash stats problems can be detected only in 32bit-wide tests, while wider tests return only "Collisions - Expected 0.00, actual 0.00")

I understand you doubt the value of a test when it always returns "ok", but you are testing hashes that are all good. ARX constructs such as Murmur/XXH are known to avalanche well.

Yes, that's my point: hashes like spooky/mur3c/mu3f/zzh are as good as sha1 by all regular smhasher measures. OTOH, i haven't seen regular smhasher tests of Fingerprint64. So, the conclusion that longer tests revealed something that was impossible to find with regular tests, is false - you just never tried to test the same hash with both methods and compare results.

And well, this all is too much for me. Overall situation with everything, starting from your wikipedia page, to incorrect conclusions in paper, tells that you just don't have correct scientific attitude - instead it looks just like advertizing company. I go off the scene...

jan-wassenberg commented 7 years ago

The results of the additional smhasher tests you proposed show you are right, the differences in the avalanche stats are mostly noise. I shouldn't draw conclusions except that all functions avalanche about equally well in this test; we'll update the paper soon. Thanks for pointing out the problem!

All these operations are reversible!

Let's discuss this after answering your other findings.

Of the top of my head, spooky is 4 bytes/cycle that is faster than your hash.

Agreed, we see the same. The current HighwayHashAVX2 (pending push to Github) is slightly faster, 0.24 c/b at 1K.

We're still thinking about other tests to run - if you have any input, it would be very welcome.

google / highwayhash

Modular reduction for finalization #32