rurban / smhasher

Hash function quality and speed tests
https://rurban.github.io/smhasher/
Other
1.86k stars 179 forks source link

Split test=Speed into SpeedBulk and SpeedSmall and report weighted average for Small key speed test #293

Closed darkk closed 2 months ago

darkk commented 3 months ago

I hope, that also addresses the goal that @wangyi-fudan had in #113 while being way more "stable" in computational terms.

It gives results looking like that:

$ ./SMHasher --test=SpeedSmall --extra SipHash
--- Testing SipHash "SipHash 2-4 - SSSE3 optimized" GOOD
...
Small key speed test -   64-byte keys -   231.26 cycles/hash
Average                                   151.602 cycles/hash
Average DNS query (| 99.8% of query.log)  123.685 cycles/hash
Average DNS name (| 99.4% of top-1m.csv)  118.621 cycles/hash
Average DNS name (| 99.6% of 200MiB.zone) 114.610 cycles/hash
Average UMASH (| 70.0% of startup-1M.bz2) 139.403 cycles/hash

or

$ SMHASHER_SMALLKEY_MAX=256 ./SMHasher --test=SpeedSmall --extra SipHash
--- Testing SipHash "SipHash 2-4 - SSSE3 optimized" GOOD
...
Small key speed test -  255-byte keys -   688.32 cycles/hash
Average                                   374.459 cycles/hash
Average DNS query (|100.0% of query.log)  124.329 cycles/hash
Average DNS name (|100.0% of top-1m.csv)  120.263 cycles/hash
Average DNS name (|100.0% of 200MiB.zone) 115.890 cycles/hash
Average UMASH (| 99.9% of startup-1M.bz2) 192.407 cycles/hash

Possible improvements are:

darkk commented 2 months ago

macOS build is, indeed, fixed, but it highlights verification fails:

1:            superfast - Verification value 0x0C80403A ....... FAIL! (Expected 0x6306A6FE)
1:             nmhash32 - Verification value 0x9905BCB1 ....... FAIL! (Expected 0x12A30553)
1:            nmhash32x - Verification value 0x74B572DA ....... FAIL! (Expected 0xA8580227)
1:            k-hashv32 - Verification value 0x9FBDD2B1 ....... FAIL! (Expected 0xB69DF8EB)
1:            k-hashv64 - Verification value 0x70E67C61 ....... FAIL! (Expected 0xA6B7E55B)
1:              polymur - Verification value 0xA612032C ....... FAIL! (Expected 0x4F894810)

Similar story happened to MSVC=1, PLATFORM=x86, it fails following verification values:

              FNV128 - Verification value 0x000000BA ....... FAIL! (Expected 0xBCAA1426)
             polymur - Verification value 0xA612032C ....... FAIL! (Expected 0x4F894810)
darkk commented 2 months ago

@rurban please, tell me, should I do anything to re-request review of this PR explicitly or are you just busy and I should just wait?

rurban commented 2 months ago

I'm very busy right now, but I also don't like constant arrays in this file. If so, in an extra file please.

Leonid Evdokimov @.***> schrieb am Do., 5. Sept. 2024, 14:35:

@rurban https://github.com/rurban please, tell me, should I do anything to re-request review of this PR explicitly or are you just busy and I should just wait?

— Reply to this email directly, view it on GitHub https://github.com/rurban/smhasher/pull/293#issuecomment-2331236387, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAKGULGS4A66R7LBODCMB3ZVA3D3AVCNFSM6AAAAABNMGYB56VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZRGIZTMMZYG4 . You are receiving this because you were mentioned.Message ID: @.***>

darkk commented 2 months ago

👌 Got you. Thanks for your reply.

Constant arrays are currently gone, I've dropped them all as curating a "representative" dataset is a whole different story.

I replaced those constant arrays with SMHASHER_SMALLKEY_WEIGHTS environment variable that is a list of space-separated weights for test=SpeedSmall and the variable has no default value at this moment.

As far as I understand, you're okay with some hard-coded weights examples, so I'll add them as a separate file with references to data sources and will make one of them a default value for SMHASHER_SMALLKEY_WEIGHTS.

rurban commented 2 months ago

Sounds better, thanks

darkk commented 2 months ago

I've found a "representative" data source for DNS domain lengths sample, so I've added it and UMASH distribution to SpeedTest.cpp.

I agree, that these data-points do not belong to main.cpp at all. I think, SpeedTest.cpp is a good place for the numbers, but if it's not, tell me, I'll move them to separate SpeedSmallReport.cpp.

rurban commented 2 months ago

Looks good now, I think. Will test it tomorrow. Esp the failing verifications look troublesome

darkk commented 2 months ago

the failing verifications look troublesome

That's true. However, they were previously hidden by corresponding builds failing altogether, so I suggest to treat them as separate issues: #294 #295 and #296

darkk commented 2 months ago

@rurban I've fixed all the issues GitHub CI highlighted in the following patch set: https://github.com/darkk/smhasher/compare/master...macos

Should I push those patches to this PR or should it be a separate one?

rurban commented 2 months ago

Seperate please