google / highwayhash

Fast strong hash functions: SipHash/HighwayHash
Apache License 2.0
1.55k stars 188 forks source link

Unexpected low speeds with the generic C?! #102

Closed Sanmayce closed 3 years ago

Sanmayce commented 3 years ago

Hi Jan and Jyrki, just included HighwayHash128 into my C program benchmarking several hashers, the problem I encountered is the very slow speed, the collisions are okay, though. Is this to be expected, considering you wrote far more stronger hasher than the non-crypto ones?!

My yesterday hashers showdown https://github.com/Cyan4973/xxHash/issues/568 lacked HighwayHash, so I tried to enrich the experience.

Excuse my amateurism, but I couldn't see how to integrate your optimized CC functions, I use only C.

Roughly, HighwayHash128 (generic) works 6,336,146/153,854= 41x faster than SHA3-224, in a real scenario, with small keys.

Testfile: KAZE_(Dictionary_SpecificationLanguage(ABBYY_Software_House))_Hanyu_Cihai_newSea-of-Words(Zho-Zho).dsl (42,920,232 bytes) Testmachine: Testmachine: laptop 'Brutalitto' AMD 4800H max turbo 4.3GHz, 64GB DDR4 3200MHz, Windows 10 Hashtable: 26bit, i.e. 67,108,864 slots, greater than (42,920,232 bytes), since in case of perfect hasher - slots should be more than the keys (could be all unique) at each position

+--------------------------+-----------------------------+----------------------------------+---------------------------------+
| Hasher,                  | Number Of Hash Collisions = | RAW Hashing Speed (in one pass,  | Linear Hashing Speed,           |
| GCC-10.1 compiler        | Distinct Keys -             | at each position) for keys       | the whole file as one key       |
| -O3 -mavx                | Number Of Trees             | 4,6,8,10,12,14,16,18,36,64 bytes |                                 |
+--------------------------+-----------------------------+----------------------------------+---------------------------------+
| XXH3_64bits v0.8.0       |                  41,108,202 |      295,187,276 KEYS-PER-SECOND | 21,786,919,796 BYTES-PER-SECOND |
| HighwayHash128 (generic) |                  41,109,295 |        5,986,502 KEYS-PER-SECOND |  1,644,642,372 BYTES-PER-SECOND |
| CRC32C (_mm_crc32_u32)   |                  41,109,478 |      274,426,023 KEYS-PER-SECOND |  5,241,205,519 BYTES-PER-SECOND |
| XXH3_128bits v0.8.0      |                  41,111,196 |      214,493,903 KEYS-PER-SECOND | 20,331,706,300 BYTES-PER-SECOND |
| SHA3-224                 |                  41,111,291 |          153,854 KEYS-PER-SECOND |     22,319,413 BYTES-PER-SECOND |
| wyhash final             |                  41,112,870 |      449,897,589 KEYS-PER-SECOND | 15,086,197,539 BYTES-PER-SECOND |
| DoubleDeuceAES_Gumbotron |                  41,117,352 |      204,869,832 KEYS-PER-SECOND |  8,690,065,195 BYTES-PER-SECOND |
| FNV1A_Pippip             |                  41,488,327 |      449,897,589 KEYS-PER-SECOND |  8,101,214,043 BYTES-PER-SECOND |
+--------------------------+-----------------------------+----------------------------------+---------------------------------+

Note1: The second column houses the cumulative value for all collisions, the collisions for all orders 4..64 were summed, that is. Note2: Folding of those 128bits should lessen the collisions.

Testfile: TERAPIG_EncyclopaediaJudaica(in_22_volumes)_TXT.tar (107,784,192 bytes) Testmachine: Testmachine: laptop 'Brutalitto' AMD 4800H max turbo 4.3GHz, 64GB DDR4 3200MHz, Windows 10 Hashtable: 27bit, i.e. 134,217,728 slots, greater than (107,784,192 bytes), since in case of perfect hasher - slots should be more than the keys (could be all unique) at each position

+--------------------------+-----------------------------+----------------------------------+---------------------------------+
| Hasher,                  | Number Of Hash Collisions = | RAW Hashing Speed (in one pass,  | Linear Hashing Speed,           |
| GCC-10.1 compiler        | Distinct Keys -             | at each position) for keys       | the whole file as one key       |
| -O3 -mavx                | Number Of Trees             | 4,6,8,10,12,14,16,18,36,64 bytes |                                 |
+--------------------------+-----------------------------+----------------------------------+---------------------------------+
| DoubleDeuceAES_Gumbotron |                 135,752,271 |      204,640,573 KEYS-PER-SECOND |  8,742,330,440 BYTES-PER-SECOND |
| HighwayHash128 (generic) |                 135,754,873 |        6,336,146 KEYS-PER-SECOND |  1,435,801,622 BYTES-PER-SECOND |
| XXH3_128bits v0.8.0      |                 135,756,978 |      212,843,977 KEYS-PER-SECOND | 22,539,563,362 BYTES-PER-SECOND |
| wyhash final             |                 135,762,454 |      442,100,861 KEYS-PER-SECOND | 14,959,638,029 BYTES-PER-SECOND |
| XXH3_64bits v0.8.0       |                 135,763,366 |      290,994,033 KEYS-PER-SECOND | 22,464,400,166 BYTES-PER-SECOND |
| CRC32C (_mm_crc32_u32)   |                 135,764,628 |      252,599,460 KEYS-PER-SECOND |  5,241,402,061 BYTES-PER-SECOND |
| FNV1A_Pippip             |                 135,768,302 |      450,602,801 KEYS-PER-SECOND |  8,048,401,433 BYTES-PER-SECOND |
| SHA3-224                 |                 135,771,905 |          153,841 KEYS-PER-SECOND |     22,246,479 BYTES-PER-SECOND |
+--------------------------+-----------------------------+----------------------------------+---------------------------------+

The part I use is from https://github.com/google/highwayhash/tree/master/c Similarly to the other hashers, I just use the preprocessor to include only the targeted hash, in this case these are the additions (one of four) I made:

#ifdef _N_HWH
    //HighwayHash128(const uint8_t* data, size_t size, const uint64_t key[4], uint64_t hash[2]);
    HighwayHash128((uint8_t *)SourceBlock, size_inLINESIXFOUR, kTestKey1, HWHresult);
    Slot += *(uint32_t *)HWHresult;
#endif

#ifdef _N_XXH128
    //Slot += (uint32_t)(XXH64(SourceBlock, size_inLINESIXFOUR, 0));
    Cyan = XXH3_128bits(SourceBlock, size_inLINESIXFOUR);
    Slot += *(uint32_t*)(&Cyan);
#endif

The slowness appears with Intel v15.0 64bit and GCC 10.1 64bit compilers, options used are:

x86_64-w64-mingw32-gcc -O3 -mavx -fomit-frame-pointer Lookuperorama.c highwayhash.c -o Lookuperorama_GCC-10.1_HWH128_64bit.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HIGH_PRIORITY -DHashInBITS=24 -DHashChunkSizeInBITS=24 -DRAMpoolInKB=5120 -DBtreeHEURISTIC -D_WIN32_ENVIRONMENT_ -DLongestLineInclusive=64 -D_N_HWH
icl /TP /O3 /arch:SSE4.1 Lookuperorama.c highwayhash.c -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HIGH_PRIORITY -D_icl_mumbo_jumbo_ /FAcs -DHashInBITS=24 -DHashChunkSizeInBITS=24 -DRAMpoolInKB=5120 -DBtreeHEURISTIC -D_WIN32_ENVIRONMENT_ -DLongestLineInclusive=64 -D_N_HWH
copy Lookuperorama.exe Lookuperorama_ICC-v19.0_HWH128_64bit.exe /y

The whole package (with compiles/binaries and sources) is here: www.sanmayce.com/Lookupperorama_r12.7z

Also, please share some thoughts on how you see collision benchmarking done right, my best shot is to run 1 trillion Knight-Tours derivatives and count the collisions within the generated 8,9 up to 16 bytes of the 128bits, however several hundred terabytes are needed, grmbl.

jan-wassenberg commented 3 years ago

Hi @Sanmayce, sorry to reply late.

Thanks for adding HighwayHash!

considering you wrote far more stronger hasher than the non-crypto ones?!

Yes, this is plausible. All 'reasonable' hashes can mix sufficiently to avoid collisions for non-adversarial input. The extra mixing and larger state in HighwayHash is useful for fingerprints/MAC applications, especially for adversarial inputs.

Excuse my amateurism, but I couldn't see how to integrate your optimized CC functions, I use only C.

The large difference in performance between C and C++ is because the C version does not use SIMD instructions. It is indeed a bit complicated to build the optimized version. It could be interesting to port HighwayHash to the Highway SIMD library, which is easier to build and supports CMake.

please share some thoughts on how you see collision benchmarking done right

Ah, that would be very interesting to see studied in more detail.

Bob Jenkins had a very thorough test program which filled memory with a hash table. Several TiB are now readily available in the cloud, I would imagine if run today, this could yield new results and possibly some surprises.

Sanmayce commented 3 years ago

It took ~240hours (on 8 cores CPU) to complete Bob Jenkins' Froggy with Gumbotron_YMM, no collisions reported. Here is the package, www.sanmayce.com/Froggy_Gumbotron_XXH128_8threads.zip

H:\Froggy_Gumbotron_XXH128_8threads>dir

09/06/2021  11:51 PM            33,369 froggy.cpp
09/07/2021  12:19 AM            84,992 Froggy_Gumbotron.exe
09/07/2021  12:19 AM            89,088 Froggy_Spooky.exe
09/07/2021  12:19 AM           109,568 Froggy_XXH128.exe
09/06/2021  11:50 PM            75,384 Gumbotron_YMM.h
09/05/2021  06:09 AM               203 makeEXE.bat
09/04/2021  02:17 AM             8,522 spooky.cpp
09/01/2021  08:18 PM            11,884 spooky.h
08/26/2021  08:44 AM             1,855 xxhash.c
08/26/2021  08:44 AM           184,809 xxhash.h

D:\Froggy_Gumbotron_XXH128_8threads>Froggy_Gumbotron.exe
count 2^^3, covered 2^^5 key pairs (thread 1)
count 2^^4, covered 2^^7 key pairs (thread 1)
count 2^^3, covered 2^^5 key pairs (thread 7)
count 2^^3, covered 2^^5 key pairs (thread 5)
count 2^^3, covered 2^^5 key pairs (thread 4)
count 2^^4, covered 2^^7 key pairs (thread 4)
count 2^^3, covered 2^^5 key pairs (thread 2)
count 2^^4, covered 2^^7 key pairs (thread 7)
count 2^^5, covered 2^^9 key pairs (thread 7)
count 2^^4, covered 2^^7 key pairs (thread 5)
count 2^^5, covered 2^^9 key pairs (thread 5)
count 2^^3, covered 2^^5 key pairs (thread 3)
count 2^^4, covered 2^^7 key pairs (thread 3)
count 2^^5, covered 2^^9 key pairs (thread 3)
count 2^^4, covered 2^^7 key pairs (thread 2)
count 2^^5, covered 2^^9 key pairs (thread 2)
count 2^^6, covered 2^^11 key pairs (thread 7)
count 2^^7, covered 2^^13 key pairs (thread 7)
count 2^^6, covered 2^^11 key pairs (thread 5)
count 2^^5, covered 2^^9 key pairs (thread 1)
count 2^^6, covered 2^^11 key pairs (thread 1)
count 2^^6, covered 2^^11 key pairs (thread 3)
count 2^^3, covered 2^^5 key pairs (thread 6)
count 2^^6, covered 2^^11 key pairs (thread 2)
count 2^^7, covered 2^^13 key pairs (thread 2)
count 2^^8, covered 2^^15 key pairs (thread 7)
count 2^^7, covered 2^^13 key pairs (thread 5)
count 2^^5, covered 2^^9 key pairs (thread 4)
count 2^^6, covered 2^^11 key pairs (thread 4)
count 2^^7, covered 2^^13 key pairs (thread 3)
count 2^^4, covered 2^^7 key pairs (thread 6)
count 2^^5, covered 2^^9 key pairs (thread 6)
count 2^^8, covered 2^^15 key pairs (thread 2)
count 2^^9, covered 2^^17 key pairs (thread 7)
count 2^^8, covered 2^^15 key pairs (thread 5)
count 2^^7, covered 2^^13 key pairs (thread 1)
count 2^^7, covered 2^^13 key pairs (thread 4)
count 2^^8, covered 2^^15 key pairs (thread 3)
count 2^^3, covered 2^^5 key pairs (thread 0)
count 2^^6, covered 2^^11 key pairs (thread 6)
count 2^^9, covered 2^^17 key pairs (thread 2)
count 2^^10, covered 2^^19 key pairs (thread 7)
count 2^^9, covered 2^^17 key pairs (thread 5)
count 2^^8, covered 2^^15 key pairs (thread 1)
count 2^^8, covered 2^^15 key pairs (thread 4)
count 2^^9, covered 2^^17 key pairs (thread 3)
count 2^^4, covered 2^^7 key pairs (thread 0)
count 2^^7, covered 2^^13 key pairs (thread 6)
count 2^^10, covered 2^^19 key pairs (thread 2)
count 2^^8, covered 2^^15 key pairs (thread 6)
count 2^^10, covered 2^^19 key pairs (thread 5)
count 2^^9, covered 2^^17 key pairs (thread 1)
count 2^^9, covered 2^^17 key pairs (thread 4)
count 2^^10, covered 2^^19 key pairs (thread 3)
count 2^^5, covered 2^^9 key pairs (thread 0)
count 2^^11, covered 2^^21 key pairs (thread 7)
count 2^^11, covered 2^^21 key pairs (thread 2)
count 2^^9, covered 2^^17 key pairs (thread 6)
count 2^^11, covered 2^^21 key pairs (thread 5)
count 2^^10, covered 2^^19 key pairs (thread 1)
count 2^^10, covered 2^^19 key pairs (thread 4)
count 2^^11, covered 2^^21 key pairs (thread 3)
count 2^^6, covered 2^^11 key pairs (thread 0)
count 2^^12, covered 2^^23 key pairs (thread 7)
count 2^^12, covered 2^^23 key pairs (thread 2)
count 2^^10, covered 2^^19 key pairs (thread 6)
count 2^^12, covered 2^^23 key pairs (thread 5)
count 2^^11, covered 2^^21 key pairs (thread 1)
count 2^^11, covered 2^^21 key pairs (thread 4)
count 2^^7, covered 2^^13 key pairs (thread 0)
count 2^^12, covered 2^^23 key pairs (thread 3)
count 2^^13, covered 2^^25 key pairs (thread 7)
count 2^^13, covered 2^^25 key pairs (thread 2)
count 2^^11, covered 2^^21 key pairs (thread 6)
count 2^^13, covered 2^^25 key pairs (thread 5)
count 2^^12, covered 2^^23 key pairs (thread 1)
count 2^^12, covered 2^^23 key pairs (thread 4)
count 2^^8, covered 2^^15 key pairs (thread 0)
count 2^^13, covered 2^^25 key pairs (thread 3)
count 2^^12, covered 2^^23 key pairs (thread 6)
count 2^^14, covered 2^^27 key pairs (thread 5)
count 2^^14, covered 2^^27 key pairs (thread 2)
count 2^^13, covered 2^^25 key pairs (thread 1)
count 2^^9, covered 2^^17 key pairs (thread 0)
count 2^^13, covered 2^^25 key pairs (thread 4)
count 2^^14, covered 2^^27 key pairs (thread 7)
count 2^^13, covered 2^^25 key pairs (thread 6)
count 2^^14, covered 2^^27 key pairs (thread 3)
count 2^^10, covered 2^^19 key pairs (thread 0)
count 2^^14, covered 2^^27 key pairs (thread 1)
count 2^^14, covered 2^^27 key pairs (thread 4)
count 2^^11, covered 2^^21 key pairs (thread 0)
count 2^^15, covered 2^^29 key pairs (thread 5)
count 2^^15, covered 2^^29 key pairs (thread 2)
count 2^^14, covered 2^^27 key pairs (thread 6)
count 2^^12, covered 2^^23 key pairs (thread 0)
count 2^^15, covered 2^^29 key pairs (thread 7)
count 2^^15, covered 2^^29 key pairs (thread 3)
count 2^^13, covered 2^^25 key pairs (thread 0)
count 2^^15, covered 2^^29 key pairs (thread 1)
count 2^^15, covered 2^^29 key pairs (thread 4)
count 2^^15, covered 2^^29 key pairs (thread 6)
count 2^^16, covered 2^^31 key pairs (thread 5)
count 2^^14, covered 2^^27 key pairs (thread 0)
count 2^^16, covered 2^^31 key pairs (thread 2)
count 2^^16, covered 2^^31 key pairs (thread 7)
count 2^^15, covered 2^^29 key pairs (thread 0)
count 2^^16, covered 2^^31 key pairs (thread 3)
count 2^^16, covered 2^^31 key pairs (thread 1)
count 2^^16, covered 2^^31 key pairs (thread 4)
count 2^^16, covered 2^^31 key pairs (thread 6)
count 2^^16, covered 2^^31 key pairs (thread 0)
count 2^^17, covered 2^^33 key pairs (thread 2)
count 2^^17, covered 2^^33 key pairs (thread 5)
count 2^^17, covered 2^^33 key pairs (thread 7)
count 2^^17, covered 2^^33 key pairs (thread 3)
count 2^^17, covered 2^^33 key pairs (thread 4)
count 2^^17, covered 2^^33 key pairs (thread 1)
count 2^^17, covered 2^^33 key pairs (thread 6)
count 2^^17, covered 2^^33 key pairs (thread 0)
count 2^^18, covered 2^^35 key pairs (thread 2)
count 2^^18, covered 2^^35 key pairs (thread 7)
count 2^^18, covered 2^^35 key pairs (thread 5)
count 2^^18, covered 2^^35 key pairs (thread 3)
count 2^^18, covered 2^^35 key pairs (thread 4)
count 2^^18, covered 2^^35 key pairs (thread 6)
count 2^^18, covered 2^^35 key pairs (thread 1)
count 2^^18, covered 2^^35 key pairs (thread 0)
count 2^^19, covered 2^^37 key pairs (thread 2)
count 2^^19, covered 2^^37 key pairs (thread 7)
count 2^^19, covered 2^^37 key pairs (thread 3)
count 2^^19, covered 2^^37 key pairs (thread 4)
count 2^^19, covered 2^^37 key pairs (thread 6)
count 2^^19, covered 2^^37 key pairs (thread 5)
count 2^^19, covered 2^^37 key pairs (thread 1)
count 2^^19, covered 2^^37 key pairs (thread 0)
count 2^^20, covered 2^^39 key pairs (thread 2)
count 2^^20, covered 2^^39 key pairs (thread 7)
count 2^^20, covered 2^^39 key pairs (thread 6)
count 2^^20, covered 2^^39 key pairs (thread 3)
count 2^^20, covered 2^^39 key pairs (thread 4)
count 2^^20, covered 2^^39 key pairs (thread 0)
count 2^^20, covered 2^^39 key pairs (thread 5)
count 2^^20, covered 2^^39 key pairs (thread 1)
count 2^^21, covered 2^^41 key pairs (thread 2)
count 2^^21, covered 2^^41 key pairs (thread 7)
count 2^^21, covered 2^^41 key pairs (thread 6)
count 2^^21, covered 2^^41 key pairs (thread 3)
count 2^^21, covered 2^^41 key pairs (thread 4)
count 2^^21, covered 2^^41 key pairs (thread 0)
count 2^^21, covered 2^^41 key pairs (thread 5)
count 2^^21, covered 2^^41 key pairs (thread 1)
count 2^^22, covered 2^^43 key pairs (thread 2)
count 2^^22, covered 2^^43 key pairs (thread 6)
count 2^^22, covered 2^^43 key pairs (thread 3)
count 2^^22, covered 2^^43 key pairs (thread 4)
count 2^^22, covered 2^^43 key pairs (thread 5)
count 2^^22, covered 2^^43 key pairs (thread 7)
count 2^^22, covered 2^^43 key pairs (thread 1)
count 2^^22, covered 2^^43 key pairs (thread 0)
count 2^^23, covered 2^^45 key pairs (thread 6)
count 2^^23, covered 2^^45 key pairs (thread 4)
count 2^^23, covered 2^^45 key pairs (thread 3)
count 2^^23, covered 2^^45 key pairs (thread 2)
count 2^^23, covered 2^^45 key pairs (thread 7)
count 2^^23, covered 2^^45 key pairs (thread 1)
count 2^^23, covered 2^^45 key pairs (thread 5)
count 2^^23, covered 2^^45 key pairs (thread 0)
count 2^^24, covered 2^^47 key pairs (thread 4)
count 2^^24, covered 2^^47 key pairs (thread 6)
count 2^^24, covered 2^^47 key pairs (thread 3)
count 2^^24, covered 2^^47 key pairs (thread 7)
count 2^^24, covered 2^^47 key pairs (thread 5)
count 2^^24, covered 2^^47 key pairs (thread 1)
count 2^^24, covered 2^^47 key pairs (thread 2)
count 2^^24, covered 2^^47 key pairs (thread 0)
count 2^^25, covered 2^^49 key pairs (thread 4)
count 2^^25, covered 2^^49 key pairs (thread 3)
count 2^^25, covered 2^^49 key pairs (thread 1)
count 2^^25, covered 2^^49 key pairs (thread 5)
count 2^^25, covered 2^^49 key pairs (thread 7)
count 2^^25, covered 2^^49 key pairs (thread 6)
count 2^^25, covered 2^^49 key pairs (thread 0)
count 2^^25, covered 2^^49 key pairs (thread 2)
count 2^^26, covered 2^^51 key pairs (thread 3)
count 2^^26, covered 2^^51 key pairs (thread 5)
count 2^^26, covered 2^^51 key pairs (thread 4)
count 2^^26, covered 2^^51 key pairs (thread 1)
count 2^^26, covered 2^^51 key pairs (thread 7)
count 2^^26, covered 2^^51 key pairs (thread 6)
count 2^^26, covered 2^^51 key pairs (thread 0)
count 2^^26, covered 2^^51 key pairs (thread 2)
count 2^^27, covered 2^^53 key pairs (thread 3)
count 2^^27, covered 2^^53 key pairs (thread 5)
count 2^^27, covered 2^^53 key pairs (thread 4)
count 2^^27, covered 2^^53 key pairs (thread 7)
count 2^^27, covered 2^^53 key pairs (thread 1)
count 2^^27, covered 2^^53 key pairs (thread 6)
count 2^^27, covered 2^^53 key pairs (thread 0)
count 2^^27, covered 2^^53 key pairs (thread 2)
count 2^^28, covered 2^^55 key pairs (thread 3)
count 2^^28, covered 2^^55 key pairs (thread 5)
count 2^^28, covered 2^^55 key pairs (thread 4)
count 2^^28, covered 2^^55 key pairs (thread 6)
count 2^^28, covered 2^^55 key pairs (thread 1)
count 2^^28, covered 2^^55 key pairs (thread 7)
count 2^^28, covered 2^^55 key pairs (thread 2)
count 2^^28, covered 2^^55 key pairs (thread 0)
count 2^^29, covered 2^^57 key pairs (thread 6)
count 2^^29, covered 2^^57 key pairs (thread 3)
count 2^^29, covered 2^^57 key pairs (thread 1)
count 2^^29, covered 2^^57 key pairs (thread 5)
count 2^^29, covered 2^^57 key pairs (thread 4)
count 2^^29, covered 2^^57 key pairs (thread 7)
count 2^^29, covered 2^^57 key pairs (thread 0)
count 2^^29, covered 2^^57 key pairs (thread 2)
count 2^^30, covered 2^^58 key pairs (thread 5)
count 2^^30, covered 2^^58 key pairs (thread 4)
count 2^^30, covered 2^^58 key pairs (thread 6)
count 2^^30, covered 2^^58 key pairs (thread 1)
count 2^^30, covered 2^^58 key pairs (thread 0)
count 2^^30, covered 2^^58 key pairs (thread 3)
count 2^^30, covered 2^^58 key pairs (thread 2)
count 2^^30, covered 2^^58 key pairs (thread 7)
count 2^^31, covered 2^^59 key pairs (thread 5)
count 2^^31, covered 2^^59 key pairs (thread 4)
count 2^^31, covered 2^^59 key pairs (thread 1)
count 2^^31, covered 2^^59 key pairs (thread 3)
count 2^^31, covered 2^^59 key pairs (thread 0)
count 2^^31, covered 2^^59 key pairs (thread 7)
count 2^^31, covered 2^^59 key pairs (thread 6)
count 2^^31, covered 2^^59 key pairs (thread 2)
count 2^^32, covered 2^^60 key pairs (thread 5)
count 2^^32, covered 2^^60 key pairs (thread 3)
count 2^^32, covered 2^^60 key pairs (thread 0)
count 2^^32, covered 2^^60 key pairs (thread 1)
count 2^^32, covered 2^^60 key pairs (thread 4)
count 2^^32, covered 2^^60 key pairs (thread 6)
count 2^^32, covered 2^^60 key pairs (thread 7)
count 2^^32, covered 2^^60 key pairs (thread 2)
count 2^^33, covered 2^^61 key pairs (thread 5)
count 2^^33, covered 2^^61 key pairs (thread 1)
count 2^^33, covered 2^^61 key pairs (thread 7)
count 2^^33, covered 2^^61 key pairs (thread 3)
count 2^^33, covered 2^^61 key pairs (thread 4)
count 2^^33, covered 2^^61 key pairs (thread 0)
count 2^^33, covered 2^^61 key pairs (thread 6)
count 2^^33, covered 2^^61 key pairs (thread 2)
count 2^^34, covered 2^^62 key pairs (thread 3)
count 2^^34, covered 2^^62 key pairs (thread 5)
count 2^^34, covered 2^^62 key pairs (thread 7)
count 2^^34, covered 2^^62 key pairs (thread 1)
count 2^^34, covered 2^^62 key pairs (thread 4)
count 2^^34, covered 2^^62 key pairs (thread 2)
count 2^^34, covered 2^^62 key pairs (thread 0)
count 2^^34, covered 2^^62 key pairs (thread 6)
count 2^^35, covered 2^^63 key pairs (thread 5)
count 2^^35, covered 2^^63 key pairs (thread 1)
count 2^^35, covered 2^^63 key pairs (thread 3)
count 2^^35, covered 2^^63 key pairs (thread 7)
count 2^^35, covered 2^^63 key pairs (thread 0)
count 2^^35, covered 2^^63 key pairs (thread 4)
count 2^^35, covered 2^^63 key pairs (thread 2)
count 2^^35, covered 2^^63 key pairs (thread 6)
count 2^^36, covered 2^^64 key pairs (thread 5)
count 2^^36, covered 2^^64 key pairs (thread 1)
count 2^^36, covered 2^^64 key pairs (thread 7)
count 2^^36, covered 2^^64 key pairs (thread 3)
count 2^^36, covered 2^^64 key pairs (thread 0)
count 2^^36, covered 2^^64 key pairs (thread 4)
count 2^^36, covered 2^^64 key pairs (thread 6)
count 2^^36, covered 2^^64 key pairs (thread 2)
count 2^^37, covered 2^^65 key pairs (thread 5)
count 2^^37, covered 2^^65 key pairs (thread 3)
count 2^^37, covered 2^^65 key pairs (thread 1)
count 2^^37, covered 2^^65 key pairs (thread 6)
count 2^^37, covered 2^^65 key pairs (thread 7)
count 2^^37, covered 2^^65 key pairs (thread 2)
count 2^^37, covered 2^^65 key pairs (thread 0)
count 2^^37, covered 2^^65 key pairs (thread 4)
count 2^^38, covered 2^^66 key pairs (thread 5)
count 2^^38, covered 2^^66 key pairs (thread 6)
count 2^^38, covered 2^^66 key pairs (thread 1)
count 2^^38, covered 2^^66 key pairs (thread 3)
count 2^^38, covered 2^^66 key pairs (thread 7)
count 2^^38, covered 2^^66 key pairs (thread 2)
count 2^^38, covered 2^^66 key pairs (thread 4)
count 2^^38, covered 2^^66 key pairs (thread 0)
count 2^^39, covered 2^^67 key pairs (thread 6)
count 2^^39, covered 2^^67 key pairs (thread 1)
count 2^^39, covered 2^^67 key pairs (thread 5)
count 2^^39, covered 2^^67 key pairs (thread 7)
count 2^^39, covered 2^^67 key pairs (thread 2)
count 2^^39, covered 2^^67 key pairs (thread 3)
count 2^^39, covered 2^^67 key pairs (thread 0)
count 2^^39, covered 2^^67 key pairs (thread 4)
count 2^^40, covered 2^^68 key pairs (thread 6)
count 2^^40, covered 2^^68 key pairs (thread 5)
count 2^^40, covered 2^^68 key pairs (thread 1)
count 2^^40, covered 2^^68 key pairs (thread 2)
count 2^^40, covered 2^^68 key pairs (thread 7)
count 2^^40, covered 2^^68 key pairs (thread 3)
count 2^^40, covered 2^^68 key pairs (thread 0)
count 2^^40, covered 2^^68 key pairs (thread 4)
count 2^^41, covered 2^^69 key pairs (thread 6)
count 2^^41, covered 2^^69 key pairs (thread 5)
count 2^^41, covered 2^^69 key pairs (thread 1)
count 2^^41, covered 2^^69 key pairs (thread 2)
count 2^^41, covered 2^^69 key pairs (thread 7)
count 2^^41, covered 2^^69 key pairs (thread 3)
count 2^^41, covered 2^^69 key pairs (thread 0)
count 2^^41, covered 2^^69 key pairs (thread 4)
count 2^^42, covered 2^^70 key pairs (thread 6)
count 2^^42, covered 2^^70 key pairs (thread 5)
count 2^^42, covered 2^^70 key pairs (thread 1)
count 2^^42, covered 2^^70 key pairs (thread 2)
count 2^^42, covered 2^^70 key pairs (thread 3)
count 2^^42, covered 2^^70 key pairs (thread 7)
count 2^^42, covered 2^^70 key pairs (thread 0)
count 2^^42, covered 2^^70 key pairs (thread 4)
count 2^^43, covered 2^^71 key pairs (thread 6)
count 2^^43, covered 2^^71 key pairs (thread 5)
count 2^^43, covered 2^^71 key pairs (thread 1)
count 2^^43, covered 2^^71 key pairs (thread 2)
count 2^^43, covered 2^^71 key pairs (thread 3)
count 2^^43, covered 2^^71 key pairs (thread 7)
count 2^^43, covered 2^^71 key pairs (thread 0)
count 2^^43, covered 2^^71 key pairs (thread 4)
count 2^^44, covered 2^^72 key pairs (thread 6)
count 2^^44, covered 2^^72 key pairs (thread 5)
count 2^^44, covered 2^^72 key pairs (thread 1)
count 2^^44, covered 2^^72 key pairs (thread 2)
count 2^^44, covered 2^^72 key pairs (thread 3)
count 2^^44, covered 2^^72 key pairs (thread 7)
count 2^^44, covered 2^^72 key pairs (thread 0)
count 2^^44, covered 2^^72 key pairs (thread 4)

D:\Froggy_Gumbotron_XXH128_8threads>

In my view keys 1..63 long are problematic, someone can do better, yet, for now Gumbotron_YMM serves well. Have the idea to change the reading of the key, e.g. for 64+ long ones as 00,01,02,03,...,64,65: Instead of reading 00..63 and then the remainder 64..65, the new one will read: 00..63 02..65 Could be faster, as well.