added byte shuffle, CRC32c, carryless multiplication, and xor-rotate

Logan007 commented 3 years ago

This pull request adds a byte shuffle shf to hash prospector for 32-bit as well as 64-bit hashing functions.

It relies on SSSE3's pshufb instruction and only works on corresponding hardware which should comprise most recent Intel/AMD CPUs. All related additions to the code are guarded by #ifdefs.

-p-provided patterns can use shf or shf:<perm> where <perm> denotes a permutation of the byte positions. In 32-bit mode, shf:03020100 describes identity, i.e. no change of position, and shf:00010203 equals an endianess changing byte swap. In 64-bit mode (-8), those permutations need to be longer, e.g. shf:0605040302010007 corresponding to an 8-bit left rotate. A sole shf employs a randomly generated permutation.

Some additional thoughts especially on 32-bit byte shuffle and their implementation can be found here.

Fixes #7. Fixes #14. Fixes #17.

Logan007 commented 3 years ago

451b7fd adds support for the carryless multiplication – clmul or clmul:<constant>in -p's patterns – in case hardware supports the CLMUL instruction set (usually all modern CPUs).

Compared to "regular" integer multiplication, carryless multiplications replaces the addition of all minor products by an exclusive-or operation. It is invertible as long as the constant multiplier is odd.

Logan007 commented 3 years ago

0f70a23 adds support for xor-rotate x ^= (x <<< a) ^ (x <<< b), xrot2 in patterns. It optionally takes (exactly) two operands, e.g. -p mul,xrot2:17:13,mul.

It internally is called xrot2 because it needs two rotates to be invertible.

As this is the first operation to accept two operands, some additional but minor changes to the code were required (data struct members, parsing).

At this stage, the pull request also fixes #14.

alexey-milovidov commented 2 years ago

@Logan007 Maybe also add https://github.com/skeeto/hash-prospector/issues/17 to this pull request?

Logan007 commented 2 years ago

I might have a look some time or other.

Logan007 commented 2 years ago

The last two commits add CRC32c-step support if hardware supports it. Works for 32-bit hashes only. Use crc32 as pattern, e.g. -p crc32,mul,crc32.

alexey-milovidov commented 2 years ago

It's interesting to check the results - maybe new good hash functions were found?

Logan007 commented 2 years ago

It's interesting to check the results - maybe new good hash functions were found?

You can download the current code of this branch from github's auto-archiver for testing.

However, ./prospector -p crc32,mul,crc32 proposes quite some CRC32-MUL-CRC32 functions. They all seem to have similar bias. For example, one function was

// score = 1.9029479221291472
uint32_t
hash(uint32_t x)
{
    x = _mm_crc32_u32(x, 0x2e84292d);
    x *= 0x941325ab;
    x = _mm_crc32_u32(x, 0x902c1e3f);
    return x;
}

So, I have put it into hash.c – mind the #include <immintrin.h> – and compiled and ran ./prospector -4 -Eel ./hash.so to determine the more exact bias which was calculated to 0.21402936517975638 and apparently hits the magnitude of two-round functions (XORSHIFT-MUL-XORSHIFT-MUL-XORSHIFT).

Also, for verification purpose, I tried ./prospector -4 -l ./hash.so -L | more which seems to deliver identical output as ./prospector -p crc32:2e84292d,mul:941325ab,crc32:902c1e3f -L | more.

I even conducted some trials using the very same constant at all three spots – leading to the exact same bias. Not needing to load several constants, this approach could slightly increase the hashing function's speed. So, ./prospector -p crc32:941325ab,mul:941325ab,crc32:941325ab -Ee outputs 0.21402936517975638.

Furthermore, removing the final crc32 gives a bias of 299.51029008431851 (CRC32-MUL) whereas removing the first crc32 results in a bias of 275.79160312307044 (MUL-CRC32).

So, a CRC32-MUL-CRC32 scheme looks quite promising – if I am not misled here which would be extremely embarrassing... :wink:

I will try to throw some more tests at this kind of hashing function and report back soon.

Logan007 commented 2 years ago

I am excited to see that it passes SmallCrunch (15/15) and masters Crunch (140/144). More to follow.

alexey-milovidov commented 2 years ago

Another function to add is higher 64 bits of 64bit * 64bit -> 128bit multiplication. Not sure if it will be useful... found somewhere here: https://github.com/erthink/t1ha

Logan007 commented 2 years ago

higher 64 bits of 64bit * 64bit -> 128bit multiplication

On the one hand, the higher bits might not make a huge difference when compared to the lower 64 bits as they get used now. The influence of the other bits is just inverse, i.e. MSB of upper 64 bits has almost no dependencies and in that respect corresponds to the LSB of the lower 64 bits.

On the other hand however, we have addition's carry smearing effect presumably being stronger on the upper 64 bits of the result than in the lower half.

So, yes, I think it might be slightly more interesting than current multiplication. The assembly part will be easy, just mul (instead of currently used imul) and mov the upper 64 bit from RDX to RAX. The C-code will be a bit more tricky, slower and perhaps not even portable – either (a * (unsigned __int128)b) >> 64 or manual multiplication, see here.

But let's see... [EDIT] First trials suggest to not expect too much from it

Logan007 commented 2 years ago

I had it running for a while now.

For regular multiplication (lower bits of result), I get 761(32 bit) and 772 (64 bit) bias:

// score = 761.60037986541965
uint32_t
hash(uint32_t x)
{
    x *= 0x365554ab;
    return x;
}

// score = 772.3235805382576
uint64_t
hash(uint64_t x)
{
    x *= 0x6edd5445d569aaad;
    return x;
}

The modified multiplication (upper half of result) gave 701 (32 bit) and 741 (64 bit) bias.

// score = 701.09132626454516
uint32_t
hash(uint32_t x)
{
    x *= 0xd5554ac7; // upper half of result !!!
    return x;
}

// score = 741.46122558848492
uint64_t
hash(uint64_t x)
{
    x *= 0xab566a5dab5acd4d; // upper half of result !!!
    return x;
}

I cross-checked the multiplier constant with the respectively other multiplication and saw 743 (32 bit) and 757 (64 bit) when using the constant from normal multiplication in modified multiplication. Using the constants found with modified multiplication in regular multiplication yielded 778 (32 bit) and 771 (64 bit).

So, the modified multiplication might offer some bias opportunities but seems to require its own class of constants. I will clean-up my test implementation and offer it as additional option in a while.

But – before moving on – one more thought as I see failing SmallCrush on the "upper-bits of multiplication result": Under what circumstances is it guaranteed to be bijective?

pdimov commented 2 years ago

It might be better to use the mulx primitive, which xors the low and high part of the multiplication, instead of just taking the high part. This - to the best of my knowledge - has been first proposed by Vladimir Makarov in MUM hash. The original uses addition, but later improvements such as wyhash seem to have standardized on xor.

Abseil uses this in its hash function for integers (https://godbolt.org/z/MTeznrTc7) and it seems to work well for them as it gives good spread with only a single multiplication.

I don't think this primitive is a bijection though.

Logan007 commented 2 years ago

Sounds interesting... If interest is, I might give it a shot sometime soon and add it to the PR (probably not before September).

skeeto / hash-prospector

added byte shuffle, CRC32c, carryless multiplication, and xor-rotate #13