skeeto / hash-prospector

Automated integer hash function discovery
The Unlicense
688 stars 27 forks source link

New best known functions #19

Open TheIronBorn opened 2 years ago

TheIronBorn commented 2 years ago

I used this project to learn combinatorial optimization.

UPDATE: a better function was found in a below comment, the new best is now 0.10704308166917044

[16 21f0aaad 15 d35a2d97 15] = 0.10760229515479501

A metaheuristic was used with an estimate with len = 1 << 25 and constrained to be nearby other good functions (shifts [13, 19], muls popcnt [11, 19]). (Unfortunately I don't remember which metaheuristic I used for this). Finally an exhaustive exact search was applied nearby the best found. I verified the final bias with the hash-prospector library itself.

I also used some explicit SIMD acceleration in the bias calculation along with the current multi threading. The hash itself handles 8 "seeds" at once and the counting is faster too. SIMD counting pseudo code:


// 256/32=8 so splat the 32-bit flips and use a mask to grab them all at once
__m256i sequence = _mm256_set_epi32(0, 1, 2, 3, 4, 5, 6, 7);
__m256i mask = _mm256_sllv_epi32(_mm256_set1_epi8(1), sequence);

for (int i = 0; i < 8; ++i) {
    __m256i splat = _mm256_set1_epi32(_mm256_extract_epi32(flips, i));
    __m256i flipped = _mm256_cmpeq_epi8(_mm256_and_si256(splat, mask), mask);
    byte_bins = _mm256_sub_epi8(byte_bins, flipped);
}

// accumulate intermediate byte bins to larger bins every 255 / 8
skeeto commented 2 years ago

Fascinating! This is such a big jump from my best, too. Since it's highly unlikely that there are better permutations outside those shift+popcnt constraints, and you did an exhaustive search, I suppose that means these are probably the best possible parameters (by this heuristic)? As in I should stop searching this particular space (not that I'm actively doing so now).

I don't understand the role of "len = 1 << 25" so could you elaborate on that part?

TheIronBorn commented 2 years ago

I don't understand the role of "len = 1 << 25" so could you elaborate on that part?

I set score_quality of estimate_bias to 25

Also the final exact search only searched within 4 "neighborhood" steps of the initial so I think there may still be some multipliers that were not seen.

TheIronBorn commented 2 years ago

Here are some others for posterity


[16 23f0aaab 15 d35a2d97 15] = 0.13066275031171543
[16 20f0aaab 15 d35a2d97 15] = 0.13195879287932719
[16 21e0aaab 15 d35a2d97 15] = 0.13195905170130587
[16 a1f0aaab 15 d35a2d97 15] = 0.13269751635322202
[16 61f0aaab 15 d35a2d97 15] = 0.13305154922498447
[16 21f0eaab 15 d35a2d97 15] = 0.13363433980870795
[16 01f0aaab 15 d35a2d97 15] = 0.13404371380695335
[16 21f02aab 15 d35a2d97 15] = 0.13502476643020897
[16 21f0aaab 15 db5a2d97 15] = 0.13565766967918283
[15 21f0aaab 15 d35a2d97 15] = 0.1367387725165565
[16 21f0aaab 15 c35a2d97 15] = 0.13741914434507
[16 21f0aaab 15 f35a2d97 15] = 0.1379071419852113
[16 21f0aaab 15 535a2d97 15] = 0.13986211866666148
[16 21f0aaab 15 935a2d97 15] = 0.14007263708693393
[16 21f0aaab 15 d25a2d97 15] = 0.1407202290848398
[16 21f0aaab 15 d3da2d97 15] = 0.14336270330186673
[16 31f0aaab 15 d35a2d97 15] = 0.14339325760142133
[16 29f0aaab 15 d35a2d97 15] = 0.14363985334436025
[16 21f0aaa9 15 d35a2d97 15] = 0.1453336983612932
[16 21f8aaab 15 d35a2d97 15] = 0.1457843639293307
[16 21b0aaab 15 d35a2d97 15] = 0.1462415903758569
[16 21f0aaab 15 d75a2d97 15] = 0.14692234324205752
[16 21f0aaab 15 d35e2d97 15] = 0.14916750980179216
[16 21f0aaab 15 d15a2d97 15] = 0.1497718718021898
[16 21f1aaab 15 d35a2d97 15] = 0.15017720286015981
[16 21d0aaab 15 d35a2d97 15] = 0.1511000121456908
[16 21f0aaab 15 d3522d97 15] = 0.1513531790691242
[16 25f0aaab 15 d35a2d97 15] = 0.15256373883426125
[16 2170aaab 15 d35a2d97 15] = 0.15336204601641212
[16 21f0aaab 15 d31a2d97 15] = 0.15595696330990985
[16 21f0aaeb 15 d35a2d97 15] = 0.15696481972915902
[16 21f0aa8b 15 d35a2d97 15] = 0.15806891443846496
[16 21f0aaab 15 d37a2d97 15] = 0.15813387758513106
[16 21f0aaab 15 d35b2d97 15] = 0.15821584918520273
[16 21f0baab 15 d35a2d97 15] = 0.158832616180062
[17 21f0aaab 15 d35a2d97 15] = 0.16142064258064714
TheIronBorn commented 2 years ago

Some more


[15 21f0aaad 15 d35a2d97 15] = 0.1133028300049351
[16 450c2d35 15 caf649a9 16] = 0.12242779617496825
[16 c50c2d35 15 caf649a9 16] = 0.12390420060083929
[16 450c2d35 15 eaf649a9 16] = 0.12425550094456454
[16 450c2d35 15 6af649a9 16] = 0.12489788028691867
[16 050c2d35 15 eaf649a9 16] = 0.12541563776043774
[16 c50c2d35 15 eaf649a9 16] = 0.12566685457424706
[16 450c2d35 15 aaf649a9 16] = 0.12643361749669804
[16 0d0c2d35 15 eaf649a9 16] = 0.12663917962201937
[16 450c2d35 15 eae649a9 16] = 0.12730634146451852
[16 450c2d35 15 e8f649a9 16] = 0.12745444770306463
[16 d30332d3 15 c865c965 15] = 0.12786059224340487
[16 4d0c2d35 15 caf649a9 16] = 0.12891794597259876
[16 930332d3 15 c865c965 15] = 0.12895521715629882
[16 410c2d35 15 eaf649a9 16] = 0.12931407115109045
[16 5d0c2d35 15 eaf649a9 16] = 0.1293659073313086
[16 130332d3 15 c865c965 15] = 0.12941101452563722
[16 4d0c2d35 15 eaf649a9 16] = 0.13032013174253892
TheIronBorn commented 2 years ago

And just for fun, new best known functions for other types:

16-bit 2-round (17.5 minutes, 970,000 unique exact bias calculations): [8 a3d3 7 4b2d 9] = 0.007252938393705358 16-bit 3-round (21 minutes, 1,100,000 unique exact bias calculations): (lower bias limit is 0.00390625) [11 b663 3 897d 6 ea57 8] = 0.0043694522287830665 and 16-bit 4-round w/o mul (43 minutes, 2,100,000 unique exact bias calculations): [4 7 2 5 3 4 8 6] = 0.0056453446937279736 a significant jump from 0.02384 for 3-round

No improvement was found for 16-bit w/o mul 3-round. There are only ~11.5 million so a brute force global search might be feasible though likely unnecessary.

I used a generic implementation of this memetic algorithm with dynamic population management for each of these with the following specifics:

TheIronBorn commented 2 years ago

As mentioned in #12, I found a new best


[16 21f0aaad 15 f35a2d97 15] = 0.10734781817103507
TheIronBorn commented 2 years ago

Yet another new best:

[16 21f0aaad 15 735a2d97 15] = 0.10704308166917044

which is a local optimum. Note that it's not a local optimum by the math in #12

OneKnightOnly commented 2 years ago

Very impressive results. Have you made any progress with 64 bit hashes? perhaps it would be easier to treat the 64 bit hash as producing two 32 bit hashes, then you'd have two exact bias numbers, instead of one estimate.

TheIronBorn commented 2 years ago

I haven’t done any work on 64-bit hashes since they are already so close to optimal

TheIronBorn commented 2 years ago

Sorry, my bad, I confused 64-bit with 32-bit 3-round hashes. No I haven't worked on 64-bit hashes

Marc-B-Reynolds commented 2 years ago

FWIW: Here's a structure I keep meaning to put some effort into but haven't gotten around to yet. The constants are all ad-hoc (no attempt at optimization)

static inline uint32_t crc32c(uint32_t x, uint32_t k) { return _mm_crc32_u32(x,k); }

uint32_t hash(uint32_t x)
{
  x  = crc32c(x,0x9e3d2c1b); x *= 0x9e485565;
  x  = crc32c(x,0x9e3d2c1b); x *= 0x9e485565;
  return x;
}
TheIronBorn commented 2 years ago

After a bit of optimization that structure seems quite strong:

[7cdff266 9c80bf99 f789c7a9 0c9cb5b5] = 0.05108489705921976

About twice as good as the "classic" hash function here and a significant jump from the previous work on CRC https://github.com/skeeto/hash-prospector/pull/13#issuecomment-977186194

Unfortunately it doesn't vectorize as well so optimization/searching is slower

Marc-B-Reynolds commented 2 years ago

Nice. This configuration also as the "cherry on top" feature of having zero fixed points (where my PoC version has two).

WRT optimization: yeah I think using a CRC step is only of potential interest as part of bit finalizing.

jaskij commented 2 years ago

I am a random passerby, with a thought.

ARM 8.1-A (pretty much ever modern Cortex-A core) has CRC32 ISA extensions. @Marc-B-Reynolds 's code might be fast on those targets?

ianstormtaylor commented 2 years ago

@TheIronBorn do you have the accompanying inverse function for the best two-step combination you found? It would be cool to add the best takeaway to the readme.

Marc-B-Reynolds commented 2 years ago

I have an implementation of the inverse of CRC32C (function crc32c_inv) here:

https://github.com/Marc-B-Reynolds/Stand-alone-junk/blob/master/src/SFH/carryless.h

TheIronBorn commented 2 years ago

@ianstormtaylor xorshift and odd multiplication both have well known inverses:

x ^= x >> 15 ^ x >> 30;
x *= 2534613543;
x ^= x >> 15 ^ x >> 30;
x *= 859588901;
x ^= x >> 16;
bryc commented 1 year ago

What makes 21f0aaad rank so strongly? I'm noticing that it sometimes seems to avalanche worse than more biased examples, such as prospector32. Though I could be completely missing something here, so pardon any misunderstanding on my part.

I attempted to apply it as a final mixer for the MicroOAAT hash in SMHasher as a test case. It uses two hash variables h1 and h2, which I've tested separately. Input is using 1048575 samples of 4 random bytes.

For h1 (top) and h2 (bottom), this is what each of the 5 steps of [16 21f0aaad 15 735a2d97 15] look like:

image

So a bunch of bits always show bias in h1, but h2 is fine.

I then tried prospector32 ([15 2c1b3c6d 12 297a2d39 15]), which should have much worse bias:

image

But it never shows any biased bits. It seems to perform much better, h1 appears to avalanche and h2 appears to have improved mixing earlier.

It doesn't seem like a fluke, as I noticed other combinations of multipliers such as [15 735a2d97 15 caf649a9 16] seem to do better as well, yet probably exhibit terrible bias, since I was just randomly mixing and matching other multipliers in this thread.


So, what I'm actually trying to do is make the fastest decent 64-bit hash in JavaScript, which is limited to 32-bit integers. It seems possible to mix h1 and h2 to do this, as MurmurHash3_128_x86 does it fairly well.

So far, I came up with this finalizer that mixes h1 and h2, which is as good as my current abilities can measure (which is likely far from optimal):

h1 ^= (h1 ^ (h2 >> 15)) * 0x735a2d97;
h2 ^= (h2 ^ (h1 >> 15)) * 0xcaf649a9;
h1 ^= h2 >> 16;
h2 ^= h1 >> 16;

But again, as soon as I start to adapt the constants [16 21f0aaad 15 735a2d97 15], thinking it would be a better mixer, it instead has measurably worse avalanche behavior. Specifically 21f0aaad appears to cause the worse behavior..

So yeah, just some observations. I don't really know what I'm doing.

TheIronBorn commented 1 year ago

The 21f0aaad variant was tested on all 32-bit values and has the best known bias according to the metric this library uses. What bias metric did you use?

bryc commented 1 year ago

I'm not sure what a bias metric is or if I'm using one. 😄

But a breakdown of what I am doing if my post was not clear:


Some more graphs, showing the avalanche after final operation. I ran the test twice for each, just to show that this behavior carries between multiple test runs.

Control test using 16 0000001f 15 0000bc8f 15: Obviously this should not avalanche. image image

Using 16 21f0aaad 15 735a2d97 15: Biased bits always occur for some odd reason. image image

Using MurmurHash3's constants 16 85ebca6b 13 c2b2ae35 16: Appears to avalanche. Similar results with many other constants that don't use 21f0aaad. image image

Using original lowbias32 from README, 16 7feb352d 15 846ca68b 16: image image

Using prospector32 from README, 15 2c1b3c6d 12 297a2d39 15: image image


Tested in another way that shows similar results, my experimental hash function, using this altered intermixed construction:

My currently chosen finalizer (no clue if its optimal but it avalanches well here):

h1 ^= (h1 ^ (h2 >> 15)) * 0x735a2d97;
h2 ^= (h2 ^ (h1 >> 15)) * 0xcaf649a9;
h1 ^= h2 >> 16;
h2 ^= h1 >> 16;

h1 is first, then h2: image image

Using TheIronBorn lowest bias constants:

h1 ^= (h1 ^ (h2 >> 16)) * 0x21f0aaad;
h2 ^= (h2 ^ (h1 >> 15)) * 0x735a2d97;
h1 ^= h2 >> 15;
h2 ^= h1 >> 15;

h1 is first, then h2: image image

Using MurmurHash3 constants:

h1 ^= (h1 ^ (h2 >> 16)) * 0x85ebca6b;
h2 ^= (h2 ^ (h1 >> 13)) * 0xc2b2ae35;
h1 ^= h2 >> 16;
h2 ^= h1 >> 16;

h1 is first, then h2: image image

Using original lowbias32 from README:

h1 ^= (h1 ^ (h2 >> 16)) * 0x7feb352d;
h2 ^= (h2 ^ (h1 >> 15)) * 0x846ca68b;
h1 ^= h2 >> 16;
h2 ^= h1 >> 16;

h1 is first, then h2: image image

One bit is biased in h2 here.

Using prospector32 from README:

h1 ^= (h1 ^ (h2 >> 15)) * 0x2c1b3c6d;
h2 ^= (h2 ^ (h1 >> 12)) * 0x297a2d39;
h1 ^= h2 >> 15;
h2 ^= h1 >> 15;

h1 is first, then h2:

image image


and has the best known bias according to the metric this library uses

Keep in mind I am not contesting this. I am simply confused as to why it appears to perform worse than other multipliers when used as finalizer in a string hash function. Maybe something else is at play here.

TheIronBorn commented 1 year ago

Looks like the bias metric that post used is the raw error per "bin". The metric this library uses is root mean square relative error over all bins. A better comparison would at least be an image of relative error over all 2^32 integers, perhaps with some sort of extra weighting of larger error

fp64 commented 1 year ago

Perhaps MicroOAAT (of len=4) interacts badly with this particular constant? Here are values of h1 for inputs 0x00..0xFF (for seed=0):

00510030 1051004A 205104E4 305104FE 40510498 50510932 6051094C 70510966 80510D80 90510D9A A0510DB4 B051124E C0511268 D0511682 E051169C F05116B6 
00511B51 10511B6B 20511B05 30511F9F 40511FB9 50511FD3 6051246D 70512407 80512421 905128BB A05128D5 B05128EF C0512D09 D0512D23 E05131BD F05131D7 
005131F2 1051360C 20513626 30513640 40513ADA 50513AF4 60513A8E 70513F28 80513F42 90513F5C A05143F6 B0514390 C051482A D0514844 E051485E F0514CF8 
00514C93 10514CAD 20515147 30515161 4051517B 50515595 605155AF 705155C9 80515A63 90515A7D A0515E97 B0515EB1 C0515ECB D0516365 E051637F F0516319 
005167B4 105167CE 205167E8 30516C02 40516C1C 50516C36 605170D0 705170EA 80517504 9051751E A0517538 B05179D2 C05179EC D0517986 E0517E20 F0517E3A 
00517E55 105182EF 20518289 305182A3 4051873D 50518757 60518771 70518B8B 80518BA5 9051903F A0519059 B0519073 C051948D D05194A7 E05194C1 F051995B 
00519976 10519910 20519DAA 30519DC4 40519DDE 5051A278 6051A212 7051A6AC 8051A6C6 9051A6E0 A051AB7A B051AB14 C051AB2E D051AFC8 E051AFE2 F051AFFC 
0051B417 1051B431 2051B44B 3051B8E5 4051B8FF 5051BD19 6051BD33 7051BD4D 8051C1E7 9051C181 A051C19B B051C635 C051C64F D051C669 E051CA83 F051CA9D 
0051CAB8 1051CF52 2051CF6C 3051D386 4051D3A0 5051D3BA 6051D854 7051D86E 8051D808 9051DCA2 A051DCBC B051DCD6 C051E170 D051E10A E051E124 F051E5BE 
0051E5D9 1051E5F3 2051EA0D 3051EA27 4051EEC1 5051EEDB 6051EEF5 7051F30F 8051F329 9051F343 A051F7DD B051F7F7 C051F791 D051FC2B E051FC45 F051FC5F 
005200FA 10520094 2052052E 30520548 40520562 505209FC 60520996 705209B0 80520E4A 90520E64 A0520E7E B0521298 C05212B2 D05212CC E0521766 F0521700 
00521B9B 10521BB5 20521BCF 30522069 40522003 5052201D 605224B7 705224D1 805224EB 90522905 A052291F B0522939 C0522DD3 D0522DED E0523207 F0523221 
0052323C 105236D6 205236F0 3052368A 40523B24 50523B3E 60523B58 70523FF2 80523F8C 90523FA6 A0524440 B052445A C0524474 D052488E E05248A8 F0524D42 
00524D5D 10524D77 20525191 305251AB 405251C5 5052565F 60525679 70525613 80525AAD 90525AC7 A0525AE1 B0525F7B C0525F15 D05263AF E05263C9 F05263E3 
0052687E 10526818 20526832 30526CCC 40526CE6 50526C80 6052711A 70527134 8052714E 905275E8 A0527582 B0527A1C C0527A36 D0527A50 E0527EEA F0527E84 
00527E9F 10528339 20528353 3052836D 40528787 505287A1 605287BB 70528C55 80528C6F 90529089 A05290A3 B05290BD C0529557 D0529571 E052950B F05299A5 

Pattern mod 16 is clearly observable.

Here are values of h2 for inputs 0x00..0xFF (for seed=0):

95E7CADC 61116253 2C3AF9C1 F7649138 C28E28AF 8DB7C01D 58E15794 240AEF0B EF348679 BA5E1DF0 8587B567 50B14CD5 1BDAE44C E7047BBA B22E1331 7D57AAA8 48814216 13AAD98D DED47104 A9FE0872 75279FE9 40513760 0B7ACECE D6A46645 A1CDFDBC 6CF7952A 38212CA1 034AC418 CE745B86 999DF2FD 64C78A6B 2FF121E2 FB1AB959 C64450C7 
916DE83E 5C977FB5 27C11723 F2EAAE9A BE144611 893DDD7F 546774F6 1F910C6D EABAA3DB B5E43B52 810DD2C0 4C376A37 176101AE E28A991C ADB43093 78DDC80A 44075F78 0F30F6EF DA5A8E66 A58425D4 70ADBD4B 3BD754C2 0700EC30 D22A83A7 9D541B15 687DB28C 33A74A03 FED0E171 C9FA78E8 9524105F 604DA7CD 2B773F44 F6A0D6BB C1CA6E29 
8CF405A0 581D9D17 23473485 EE70CBFC B99A636A 84C3FAE1 4FED9258 1B1729C6 E640C13D B16A58B4 7C93F022 47BD8799 12E71F10 DE10B67E A93A4DF5 7463E56C 3F8D7CDA 0AB71451 D5E0ABC8 A10A4336 6C33DAAD 375D721B 02870992 CDB0A109 98DA3877 6403CFEE 2F2D6765 FA56FED3 C580964A 90AA2DC1 5BD3C52F 26FD5CA6 F226F41D BD508B8B 
887A2302 53A3BA70 1ECD51E7 E9F6E95E B52080CC 804A1843 4B73AFBA 169D4728 E1C6DE9F ACF07616 781A0D84 4343A4FB 0E6D3C72 D996D3E0 A4C06B57 6FEA02C5 3B139A3C 063D31B3 D166C921 9C906098 67B9F80F 32E38F7D FE0D26F4 C936BE6B 946055D9 5F89ED50 2AB384C7 F5DD1C35 C106B3AC 8C304B1A 5759E291 22837A08 EDAD1176 B8D6A8ED 
84004064 4F29D7D2 1A536F49 E57D06C0 B0A69E2E 7BD035A5 46F9CD1C 1223648A DD4CFC01 A8769378 73A02AE6 3EC9C25D 09F359CB D51CF142 A04688B9 6B702027 3699B79E 01C34F15 CCECE683 98167DFA 63401571 2E69ACDF F9934456 C4BCDBCD 8FE6733B 5B100AB2 2639A220 F1633997 BC8CD10E 87B6687C 52DFFFF3 1E09976A E9332ED8 B45CC64F 
7F865DC6 4AAFF534 15D98CAB E1032422 AC2CBB90 77565307 427FEA75 0DA981EC D8D31963 A3FCB0D1 6F264848 3A4FDFBF 0579772D D0A30EA4 9BCCA61B 66F63D89 321FD500 FD496C77 C87303E5 939C9B5C 5EC632CA 29EFCA41 F51961B8 C042F926 8B6C909D 56962814 21BFBF82 ECE956F9 B812EE70 833C85DE 4E661D55 198FB4CC E4B94C3A AFE2E3B1 
7B0C7B28 46361296 115FAA0D DC89417B A7B2D8F2 72DC7069 3E0607D7 092F9F4E D45936C5 9F82CE33 6AAC65AA 35D5FD21 00FF948F CC292C06 9752C37D 627C5AEB 2DA5F262 F8CF89D0 C3F92147 8F22B8BE 5A4C502C 2575E7A3 F09F7F1A BBC91688 86F2ADFF 521C4576 1D45DCE4 E86F745B B3990BD2 7EC2A340 49EC3AB7 1515D225 E03F699C AB690113 
76929881 41BC2FF8 0CE5C76F D80F5EDD A338F654 6E628DCB 398C2539 04B5BCB0 CFDF5427 9B08EB95 6632830C 315C1A7A FC85B1F1 C7AF4968 92D8E0D6 5E02784D 292C0FC4 F455A732 

Pattern mod 34 (=2*17) is somewhat observable.

Code.

Note: While 0x21f0aaad=569420461 is prime, it modular inverse 859588901=17×2707×18679.

Edit: typesetting.

funny-falcon commented 1 year ago

MicroOAAT author here :) It is intriguing someone reached it.

The hash function uses two variables to read and mix data, h1 and h2. Only h2 is returned, while h1 is used for intermediate mixing.

No, it returns h1^h2 , and it is important. Just returning one variable is not good enough. If you want to test something on MicroOAAT, you should test on h1^h2, not separate h1 or h2.

Actually it is designed to be used with same finalizer as GoodOAAT GoodOAAT were made because Clang didn't optimize MicroOAAT properly at that time.

Note: I'm not good at this math at all. I just used semi-scripted "genetic" algo over SMHasher output to find rotation values for finalizer. And doubdfully the finalizer is good for something except this two functions.

bryc commented 1 year ago

Ah, my apologies, GoodOAAT is the one that returns only h2. I must have gotten them confused when I wrote that, I was testing with both because they are similar, but was more focused on MicroOAAT because it had fewer operations in total. Cheers!

tommyettinger commented 1 year ago

@bryc Your finalizer isn't bijective (cannot be reversed), if I am reading your code correctly, because of how h1 and h2 are mixed together:

h1 ^= (h1 ^ (h2 >> 15)) * 0x735a2d97;
h2 ^= (h2 ^ (h1 >> 15)) * 0xcaf649a9;
h1 ^= h2 >> 16;
h2 ^= h1 >> 16;

I believe you could make it reversible with some small changes, but I'd have to ask an expert ( @Marc-B-Reynolds I think wrote a blog post about this?) about the last two lines, since they may be reversible without any changes, but I think they need a small change.

h1 ^= (h2 ^ (h2 >> 15)) * 0x735a2d97;
h2 ^= (h1 ^ (h1 >> 15)) * 0xcaf649a9;
h1 ^= (h2 ^ (h2 >> 16));
h2 ^= (h1 ^ (h1 >> 16));

Personally, I would use += instead of ^= because at least in random number generator testing, using only bitwise operations (that is, math that doesn't carry) is much more likely to fail binary matrix rank tests. This uses multiplication, so it might not be affected.

I can try to write the inverse if you want. Being invertible is important for a finalizer because it ensures all outputs can occur given all inputs to the finalizer.

bryc commented 1 year ago

@tommyettinger

Please do; I'd greatly appreciate any insights. 😄 I know almost nothing about reversibility, I was just purely letting the avalanche graphs guide me.

The choice for XOR over ADD is mostly because I generally found fewer collisions and better distribution in my naïve tests when using XOR, but completely unscientific of course. I can't really measure a meaningful difference with my crappy tests. It just seemed to work as well or better, and was faster. It's also a bit of a better choice in JavaScript (one of my target platforms), as using addition tends to break out of the 32-bit optimization mode modern JS engines have, that XOR and Math.imul benefit from, and one must use an additional OR operation to redirect the addition result back to 32-bit mode.

If I ever figure out how to get SMHasher to run on W10 any potential issues in the hash function will probably be a lot more obvious.

tommyettinger commented 1 year ago

Great point about XOR on JS; this should be able to stick with just XOR, shifts, and multiplication then. I only relatively-recently learned Math.imul() existed, and I am very thankful it does. As for the reversibility, I should say a) it turned out much simpler than I expected, and b) your original last two lines actually are reversible.

If you want to reverse this hash over the JS Numbers h1 and h2, where both are in the 32-bit range:

h1 ^= Math.imul((h2 ^ (h2 >>> 15)), 0x735a2d97);
h2 ^= Math.imul((h1 ^ (h1 >>> 15)), 0xcaf649a9);
h1 ^= h2 >>> 16;
h2 ^= h1 >>> 16;

This is the reverse:

h2 ^= h1 >>> 16;
h1 ^= h2 >>> 16;
h2 ^= Math.imul((h1 ^ (h1 >>> 15)), 0xcaf649a9);
h1 ^= Math.imul((h2 ^ (h2 >>> 15)), 0x735a2d97);

Yes, those constants may look familiar. The entire thing probably looks familiar. The only change I had to do to the forward hash was to ensure each xorshift that gets multiplied uses the same variable throughout the right hand side, and a different variable on the left. The only change I then had to do to reverse it was to reverse the lines.

I can see about running it through SMHasher; I have gotten it working on Windows before, but I mainly run it on Linux. The fork I use is kind-of tricky to get a hash up and running; this round function is mostly what I need if I can plug it in to replace an existing pair-of-ints-to-pair-of-ints function.

funny-falcon commented 1 year ago
h1 ^= h2 >>> 16;
h2 ^= h1 >>> 16;

If h2 is the result, then h1 ^= h2 >> 16 is completely pointless, since h2 ^= h1 >> 16 doesn't count any mixed bits in this step.

If both h1 and h2 are results, then there is some sence, though.


h1 ^= (h2 ^ (h2 >> 16));
h2 ^= (h1 ^ (h1 >> 16));

It is equivalent to

t1 = h1;
h1 ^= (h2 ^ (h2 >> 16));
h2 = t1 ^ (t1 >> 16) ^ (h2 >> 16);

i.e. highest bits of h2 are removed from final h2 value (but they are in h1). Doubdfully that is desired behaviour.

funny-falcon commented 1 year ago
h1 ^= (h1 ^ (h2 >> 15)) * 0x735a2d97;

certainly it is not reversible, since with any h2 and any h1, value h1_1 = h1 ^ (1<<31) produces same result. But h1 = (h1 ^ (h2 >> 15)) * 0x735a2d97; is reversible.

funny-falcon commented 1 year ago

Considering doubts about "0.10704308166917044" as "best value".

Shouldn't "best value" be close to theoretical random result? I thought, true random could not be too "flat". There are fluctuations, which distinct "true random" from obviously synthesized stream of values.

So, if we want immitate "true random", then we should not be "too good to be true".

So, I'd rather aim for values achievable with strong cryptographic functions: SHA-256, SHA-512, SHA3, and others (SipHash-2-4, SipHash-4-8). Smart people spend their life to generate values that certainly indistinguishable from random. If we get results that differs, we do something wrong.

But I'm not professional mathematic, so I could be mistaken.

funny-falcon commented 1 year ago

I believe, Siphash-4-8 (and even SipHash-2-4) is certainly good reference to compare results with.

bryc commented 1 year ago

If both h1 and h2 are results, then there is some sence, though.

Yes they are indeed both results - the intention is that both h1 / h2 should be equally mixed and produce a 64-bit hash. I'm trying to make a decent fast 64-bit hash using 32-bit operations, due to the fact that JS bitwise and imul operations must be 32-bit results (limitation of the language). It's unfortunately a dying art when we have much superior 64-bit operations available everywhere else. 😄

The actual JS implementation I have forms a 53-bit integer, discarding some bits, because 64-bit integers is not possible without more complicated and slower logic:

const cyrb53a = function(str, seed = 0) {
  let h1 = 0xdeadbeef ^ seed, h2 = 0x41c6ce57 ^ seed;
  for(let i = 0, ch; i < str.length; i++) {
    ch = str.charCodeAt(i);
    h1 = Math.imul(h1 ^ ch, 0x85ebca77);
    h2 = Math.imul(h2 ^ ch, 0xc2b2ae3d);
  }
  h1 ^= Math.imul(h1 ^ (h2 >>> 15), 0x735a2d97);
  h2 ^= Math.imul(h2 ^ (h1 >>> 15), 0xcaf649a9);
  h1 ^= h2 >>> 16; h2 ^= h1 >>> 16;
    return 2097152 * (h2 >>> 0) + (h1 >>> 11);
};

Math.imul is just 32-bit multiplication in C. So quick pseudo C code:

// init hash
    u32 h1 = 0xdeadbeef ^ seed;
    u32 h2 = 0x41c6ce57 ^ seed;
// loop each input char to hash, one at a time:
    h1 = (h1 ^ char) * 0x85ebca77;
    h2 = (h2 ^ char) * 0xc2b2ae3d;
// final mix
    h1 ^= (h1 ^ (h2 >> 15)) * 0x735a2d97;
    h2 ^= (h2 ^ (h1 >> 15)) * 0xcaf649a9;
    h1 ^= h2 >> 16;
    h2 ^= h1 >> 16;
// h2 combined with h1 is the final output
    return (h2 << 32) | h1;

My h1/h2 mixer (which is WIP) is similar to the OP mixer but uses fewer operations. It was my hope that intermixing two 32-bit hash variables could optimize the number of operations in the final mixer down and still allow for a meaningful 64-bit result. I'm not sure if this is misguided (a few extra ops in a final mixer may not be that expensive, even in JS), but it does seem to avalanche decently, so I've been exploring it.

My initial post in this thread was basically expressing confusion as to why I was getting worse avalanche results when testing my h1/h2 mixer, despite using the least biased multipliers. There certainly seems to be some sensitivity or pattern to the constants chosen.


Also I should probably explain why I was testing using MicroOAAT: basically it is the simplest hash that uses two 32-bit hash variables in the loop step. It does not have a significant final mix operation, it just XORs h1^h2 as the final value, producing 32-bit hash.

So the idea was to apply my h1/h2 mixer to see if both h1 and h2 still avalanched and produced useful 64-bit hash, without the additional multiplication in the loop, which I had done in my cyrb53 hash above.

And simply put, more biased multipliers showed less bias in the avalanche results, than the highest scoring multiplier (and best one, supposedly).

TheIronBorn commented 1 year ago

Considering doubts about "0.10704308166917044" as "best value".

Have you seen https://github.com/skeeto/hash-prospector/issues/12#issuecomment-1133540690? For a truly random distribution the average is 0.0152587890625. So 0.10704308166917044 is far from "ideal", it's only the best one found so far.

funny-falcon commented 1 year ago

Considering doubts about "0.10704308166917044" as "best value".

Have you seen #12 (comment)? For a truly random distribution the average is 0.0152587890625. So 0.10704308166917044 is far from "ideal", it's only the best one found so far.

Could this value (or close to) achieved with SipHash/SHA2/SHA3 measured by software in this repo?

If yes, then you're right.

If not, then formula in that comment is about some other measurement.

Just measure SHA2 or SipHash and prove me wrong.

TheIronBorn commented 1 year ago

I just ran SipHash over the inputs [0, 67108864), seeded with 0x243f6a8885a308d3 and 0x13198a2e03707344 (Pi) and xor'd to 32 bits. I also ran this https://github.com/skeeto/hash-prospector/issues/19#issuecomment-1120105785 function over the same inputs. (Note that the bias score is not the same as for all 2^32 inputs)

SipHash: 0.16265987941647841 this thread: 0.21615393361888857

funny-falcon commented 1 year ago

Nice result. Could you run SipHash for whole 2^32, please? So we have real achievable target.

TheIronBorn commented 1 year ago

SipHash over all 2^32: 0.0211225759194

funny-falcon commented 1 year ago

So, ~0.021 estimate is real.

funny-falcon commented 1 year ago

So the idea was to apply my h1/h2 mixer to see if both h1 and h2 still avalanched and produced useful 64-bit hash, without the additional multiplication in the loop

I never expected MicroOAAT has enough quality to produce 64bit output. It is built in assumption it has more than “32bit of randomness”, so result truncated to 32bits is good enough.

I didn’t calc, but I expect it is as good as 40-48 bit hash. Certainly, properly mixed it could look as decent 64bit hash for simple use cases.

Try to apply GoodOAAT finalizer (since it were built for MicroOAAT), and add one more step to mix more 32 bits. I think it will “look like” good 64 bit hash.

pspeter3 commented 7 months ago

Is @bryc's comment here the best known values / implementation for JavaScript so far? I'm trying to parse this thread to understand the current state. If so, is it reversible?

bryc commented 7 months ago

@pspeter3 My posts are purely about how the avalanche behavior of these constants seem to act differently when mixing two hashes together instead of just one hash. That is, the best was no longer the best for my specific use case for performing a partial final mix of two 32-bit hashes into a 64-bit whole.

The intended construction for this project (hash-prospector) is to have the operations applied to a single 32-bit hash as the final operation. As such, it is an integer hash. And the best result is still https://github.com/skeeto/hash-prospector/issues/19#issuecomment-1120105785. So in theory, the operation can be done twice in whole for h1 and h2 (then simply xoring the results together), but I was looking for a way to optimize it for a decent partial mix, because some initial mixing can be achieved during the loop step.

The full set of operations is tested as an integer hash, without any initial mixing applied, so the avalanche results don't rely on any partial mixing.

Hope that clears any confusion :)

In terms of it being reversible, I believe so:

var h1 = 0xdeadbeef, h2 = 0xcafebabe;
// do it
h1 ^= Math.imul((h2 ^ (h2 >>> 15)), 0x735a2d97);
h2 ^= Math.imul((h1 ^ (h1 >>> 15)), 0xcaf649a9);
h1 ^= h2 >>> 16;
h2 ^= h1 >>> 16;
// undo it
h2 ^= h1 >>> 16;
h1 ^= h2 >>> 16;
h2 ^= Math.imul((h1 ^ (h1 >>> 15)), 0xcaf649a9);
h1 ^= Math.imul((h2 ^ (h2 >>> 15)), 0x735a2d97);
console.log((h1>>>0).toString(16), (h2>>>0).toString(16));
// both will return deadbeef and cafebabe
pspeter3 commented 7 months ago

Thank you for sharing! I think I'm struggling getting the inverse working generally in JavaScript. Copying the example from the README with the values from the earlier comment [16 21f0aaad 15 d35a2d97 15], the inverse of 0x21f0aaad as 0x333c4925 and 0xd35a2d97 as 0x37132227. I get the following code which does not work. (Your code does work though which is awesome, I think I'm just missing something obvious)

/**
 * Hashes a value.
 * @param {number} x
 * @returns {number}
 */
function toLowBias32(x) {
    x ^= x >>> 16;
    x = Math.imul(x, 0x21f0aaad);
    x ^= x >>> 15;
    x = Math.imul(x, 0xd35a2d97);
    x ^= x >>> 15;
    return x;
}

/**
 * Inverses the hashing function.
 * @param {number} x
 * @returns {number}
 */
function fromLowBias32(x) {
    x ^= x >>> 15;
    x = Math.imul(x, 0x37132227);
    x ^= x >>> 15;
    x = Math.imul(x, 0x333c4925);
    x ^= x >>> 16;
    return x;
}

// 1337 1394193351 -1106502979
console.log(1337, toLowBias32(1337), fromLowBias32(toLowBias32(1337)));
Marc-B-Reynolds commented 7 months ago

@pspeter3 The inverse of x ^= x >>> 15; is x ^= x >>> 15; x ^= x >>> 30;

SEE: here

pspeter3 commented 7 months ago

Thanks @Marc-B-Reynolds! That was super helpful and I have a working version now.

/**
 * Hashes a value.
 * @param {number} x
 * @returns {number}
 */
function toLowBias32(x) {
    x ^= x >>> 16;
    x = Math.imul(x, 0x21f0aaad);
    x ^= x >>> 15;
    x = Math.imul(x, 0xd35a2d97)
    x ^= x >>> 15;
    return x >>> 0;
}

/**
 * Inverses the hashing function.
 * @param {number} x
 * @returns {number}
 */
function fromLowBias32(x) {
    x ^= x >>> 15;
    x ^= x >>> 30;
    x = Math.imul(x, 0x37132227);
    x ^= x >>> 15;
    x ^= x >>> 30;
    x = Math.imul(x, 0x333c4925);
    x ^= x >>> 16;
    return x >>> 0;
}

Now I'm curious based on @bryc's research if I can make toLowBias53 and vice versa.