zeebo / xxh3

XXH3 algorithm in Go
BSD 2-Clause "Simplified" License
403 stars 20 forks source link

Seed support #4

Closed peroxyacyl closed 2 years ago

peroxyacyl commented 4 years ago

As Cyan mentioned in his blog, xxh3 has initially developed for bloom filters that need changing seed for each hashing. Now I make a bloom filter in Go in my project so I have written this patch.

However, the performance gets slower due to:

  1. extra instructions to add/sub keys
  2. disarm of some immediate value hacks
  3. larger stack and copying key to an array in case of hashing large bytes.

I'm making this PR just for saying thanks for your work. It's another option to host separate codes both high performance and seed support.

Here are benchmarks on my Intel Core i7-4790K CPU @ 4.00GHz.

master vs seed = 0

$benchstat master seed0
name                   master time/op  seed0 time/op  delta
Fixed/0-8              3.58ns ± 8%    5.70ns ± 6%  +59.20%  (p=0.000 n=95+92)
Fixed/1-8              4.03ns ± 4%    5.56ns ± 5%  +38.12%  (p=0.000 n=93+94)
Fixed/2-8              4.04ns ± 4%    5.62ns ± 6%  +38.87%  (p=0.000 n=95+96)
Fixed/3-8              3.69ns ± 3%    5.50ns ± 6%  +49.34%  (p=0.000 n=92+91)
Fixed/4-8              4.58ns ± 4%    6.36ns ± 5%  +38.71%  (p=0.000 n=91+91)
Fixed/8-8              4.58ns ± 4%    6.37ns ± 6%  +38.95%  (p=0.000 n=97+92)
Fixed/9-8              4.38ns ± 5%    6.00ns ± 7%  +37.12%  (p=0.000 n=95+92)
Fixed/16-8             4.38ns ± 4%    6.01ns ± 6%  +37.26%  (p=0.000 n=98+95)
Fixed/17-8             7.04ns ± 4%    8.32ns ± 5%  +18.09%  (p=0.000 n=95+94)
Fixed/32-8             7.03ns ± 4%    8.28ns ± 5%  +17.68%  (p=0.000 n=94+88)
Fixed/33-8             10.0ns ± 7%    11.3ns ± 6%  +13.24%  (p=0.000 n=97+94)
Fixed/64-8             10.0ns ± 4%    11.4ns ± 7%  +14.09%  (p=0.000 n=96+93)
Fixed/65-8             12.9ns ± 5%    14.3ns ± 6%  +10.96%  (p=0.000 n=96+97)
Fixed/96-8             12.9ns ± 4%    14.3ns ± 7%  +11.12%  (p=0.000 n=91+93)
Fixed/97-8             15.5ns ± 4%    16.9ns ± 6%   +8.57%  (p=0.000 n=95+92)
Fixed/128-8            15.5ns ± 4%    16.9ns ± 5%   +8.58%  (p=0.000 n=92+90)
Fixed/129-8            18.0ns ± 4%    20.2ns ± 5%  +11.97%  (p=0.000 n=95+93)
Fixed/240-8            34.1ns ± 4%    39.1ns ± 6%  +14.72%  (p=0.000 n=97+93)
Fixed/241-AVX2-8       65.9ns ± 4%    68.5ns ± 4%   +3.91%  (p=0.000 n=97+94)
Fixed/241-SSE2-8       23.7ns ± 4%    27.4ns ± 5%  +15.61%  (p=0.000 n=93+95)
Fixed/241-8            42.1ns ± 3%    44.9ns ± 5%   +6.78%  (p=0.000 n=94+93)
Fixed/512-AVX2-8       70.2ns ± 4%    73.8ns ± 4%   +5.12%  (p=0.000 n=96+93)
Fixed/512-SSE2-8       33.4ns ± 4%    37.2ns ± 4%  +11.18%  (p=0.000 n=94+94)
Fixed/512-8            73.5ns ± 4%    77.2ns ± 6%   +5.10%  (p=0.000 n=93+94)
Fixed/1024-AVX2-8      82.3ns ± 4%    86.3ns ± 5%   +4.85%  (p=0.000 n=99+95)
Fixed/1024-SSE2-8      53.5ns ± 5%    57.1ns ± 4%   +6.84%  (p=0.000 n=97+93)
Fixed/1024-8            135ns ± 3%     141ns ± 4%   +4.61%  (p=0.000 n=92+93)
Fixed/8192-AVX2-8       235ns ± 5%     243ns ± 4%   +3.19%  (p=0.000 n=95+95)
Fixed/8192-SSE2-8       353ns ± 4%     355ns ± 5%     ~     (p=0.329 n=92+93)
Fixed/8192-8           1.06µs ± 5%    1.10µs ± 7%   +3.82%  (p=0.000 n=96+96)
Fixed/102400-AVX2-8    2.32µs ± 4%    2.39µs ± 4%   +3.06%  (p=0.000 n=92+92)
Fixed/102400-SSE2-8    4.31µs ± 4%    4.39µs ± 6%   +1.81%  (p=0.000 n=90+96)
Fixed/102400-8         13.2µs ± 3%    13.7µs ± 5%   +3.91%  (p=0.000 n=92+98)

master vs seed = 42

$benchstat master seed42
name                   master time/op seed42 time/op delta
Fixed/0-8              3.58ns ± 8%    5.76ns ± 6%   +60.96%  (p=0.000 n=95+47)
Fixed/1-8              4.03ns ± 4%    5.83ns ± 7%   +44.78%  (p=0.000 n=93+44)
Fixed/2-8              4.04ns ± 4%    5.90ns ±11%   +45.87%  (p=0.000 n=95+47)
Fixed/3-8              3.69ns ± 3%    5.72ns ± 7%   +55.28%  (p=0.000 n=92+48)
Fixed/4-8              4.58ns ± 4%    6.58ns ± 6%   +43.50%  (p=0.000 n=91+45)
Fixed/8-8              4.58ns ± 4%    6.59ns ± 4%   +43.77%  (p=0.000 n=97+43)
Fixed/9-8              4.38ns ± 5%    6.16ns ± 6%   +40.72%  (p=0.000 n=95+48)
Fixed/16-8             4.38ns ± 4%    6.20ns ± 6%   +41.44%  (p=0.000 n=98+47)
Fixed/17-8             7.04ns ± 4%    8.11ns ± 3%   +15.22%  (p=0.000 n=95+42)
Fixed/32-8             7.03ns ± 4%    8.18ns ± 5%   +16.30%  (p=0.000 n=94+48)
Fixed/33-8             10.0ns ± 7%    11.2ns ± 5%   +11.83%  (p=0.000 n=97+45)
Fixed/64-8             10.0ns ± 4%    11.2ns ± 4%   +12.02%  (p=0.000 n=96+45)
Fixed/65-8             12.9ns ± 5%    14.1ns ± 7%    +9.03%  (p=0.000 n=96+45)
Fixed/96-8             12.9ns ± 4%    14.1ns ± 7%    +9.39%  (p=0.000 n=91+44)
Fixed/97-8             15.5ns ± 4%    16.7ns ± 4%    +7.50%  (p=0.000 n=95+48)
Fixed/128-8            15.5ns ± 4%    16.8ns ± 8%    +8.09%  (p=0.000 n=92+48)
Fixed/129-8            18.0ns ± 4%    20.2ns ± 5%   +11.90%  (p=0.000 n=95+47)
Fixed/240-8            34.1ns ± 4%    39.0ns ± 3%   +14.53%  (p=0.000 n=97+43)
Fixed/241-AVX2-8       65.9ns ± 4%    99.7ns ± 5%   +51.35%  (p=0.000 n=97+46)
Fixed/241-SSE2-8       23.7ns ± 4%    54.7ns ± 7%  +131.21%  (p=0.000 n=93+45)
Fixed/241-8            42.1ns ± 3%    72.0ns ± 4%   +71.29%  (p=0.000 n=94+46)
Fixed/512-AVX2-8       70.2ns ± 4%   104.7ns ± 4%   +49.00%  (p=0.000 n=96+47)
Fixed/512-SSE2-8       33.4ns ± 4%    63.9ns ± 6%   +91.06%  (p=0.000 n=94+45)
Fixed/512-8            73.5ns ± 4%   103.9ns ± 8%   +41.40%  (p=0.000 n=93+46)
Fixed/1024-AVX2-8      82.3ns ± 4%   117.1ns ± 3%   +42.27%  (p=0.000 n=99+45)
Fixed/1024-SSE2-8      53.5ns ± 5%    83.5ns ± 4%   +56.21%  (p=0.000 n=97+45)
Fixed/1024-8            135ns ± 3%     169ns ± 7%   +25.03%  (p=0.000 n=92+44)
Fixed/8192-AVX2-8       235ns ± 5%     276ns ± 5%   +17.43%  (p=0.000 n=95+46)
Fixed/8192-SSE2-8       353ns ± 4%     419ns ± 8%   +18.60%  (p=0.000 n=92+45)
Fixed/8192-8           1.06µs ± 5%    1.14µs ± 6%    +7.57%  (p=0.000 n=96+47)
Fixed/102400-AVX2-8    2.32µs ± 4%    2.44µs ± 6%    +5.25%  (p=0.000 n=92+46)
Fixed/102400-SSE2-8    4.31µs ± 4%    4.56µs ± 8%    +5.88%  (p=0.000 n=90+48)
Fixed/102400-8         13.2µs ± 3%    13.8µs ± 5%    +4.74%  (p=0.000 n=92+46)
peroxyacyl commented 4 years ago

Another benchmark on Intel Xeon Platinum 8151 CPU @ 3.40GHz

master vs seed=0

name                 old time/op    new time/op    delta
Fixed/0-2              3.19ns ± 0%    5.74ns ± 1%  +80.11%  (p=0.000 n=99+98)
Fixed/1-2              3.64ns ± 0%    5.58ns ± 2%  +53.60%  (p=0.000 n=100+99)
Fixed/2-2              3.43ns ± 0%    5.87ns ± 0%  +70.99%  (p=0.000 n=100+100)
Fixed/3-2              3.27ns ± 0%    5.45ns ± 0%  +66.74%  (p=0.000 n=100+98)
Fixed/4-2              3.88ns ± 0%    5.61ns ± 1%  +44.62%  (p=0.000 n=100+100)
Fixed/8-2              3.88ns ± 0%    5.61ns ± 1%  +44.63%  (p=0.000 n=99+100)
Fixed/9-2              3.74ns ± 0%    5.06ns ± 1%  +35.25%  (p=0.000 n=99+100)
Fixed/16-2             3.74ns ± 0%    5.06ns ± 1%  +35.24%  (p=0.000 n=97+100)
Fixed/17-2             6.27ns ± 0%    7.91ns ± 1%  +26.18%  (p=0.000 n=100+91)
Fixed/32-2             6.27ns ± 0%    7.91ns ± 0%  +26.13%  (p=0.000 n=100+84)
Fixed/33-2             9.28ns ± 0%   11.07ns ± 2%  +19.23%  (p=0.000 n=100+100)
Fixed/64-2             9.28ns ± 0%   11.08ns ± 1%  +19.32%  (p=0.000 n=100+100)
Fixed/65-2             11.1ns ± 1%    13.6ns ± 1%  +22.16%  (p=0.000 n=101+100)
Fixed/96-2             11.1ns ± 1%    13.6ns ± 2%  +22.42%  (p=0.000 n=101+100)
Fixed/97-2             13.2ns ± 0%    16.3ns ± 3%  +23.24%  (p=0.000 n=99+100)
Fixed/128-2            13.2ns ± 0%    16.3ns ± 4%  +23.61%  (p=0.000 n=98+100)
Fixed/129-2            15.3ns ± 0%    16.6ns ± 0%   +8.50%  (p=0.000 n=100+100)
Fixed/240-2            28.8ns ± 0%    32.3ns ± 0%  +12.15%  (p=0.000 n=100+100)
Fixed/241-AVX2-2       19.5ns ± 0%    21.5ns ± 0%  +10.49%  (p=0.000 n=100+100)
Fixed/241-SSE2-2       22.2ns ± 0%    24.4ns ± 1%  +10.12%  (p=0.000 n=94+100)
Fixed/241-2            44.6ns ± 1%    44.6ns ± 0%   -0.13%  (p=0.002 n=99+99)
Fixed/512-AVX2-2       24.0ns ± 0%    26.1ns ± 0%   +8.75%  (p=0.000 n=100+100)
Fixed/512-SSE2-2       31.3ns ± 0%    34.3ns ± 1%   +9.58%  (p=0.000 n=100+100)
Fixed/512-2            71.6ns ± 0%    75.3ns ± 0%   +5.13%  (p=0.000 n=99+93)
Fixed/1024-AVX2-2      35.1ns ± 0%    37.7ns ± 0%   +7.28%  (p=0.000 n=92+99)
Fixed/1024-SSE2-2      50.2ns ± 0%    53.3ns ± 1%   +6.15%  (p=0.000 n=100+99)
Fixed/1024-2            133ns ± 0%     137ns ± 0%   +3.01%  (p=0.000 n=100+100)
Fixed/8192-AVX2-2       190ns ± 0%     192ns ± 0%   +1.29%  (p=0.000 n=99+99)
Fixed/8192-SSE2-2       337ns ± 0%     339ns ± 0%   +0.59%  (p=0.000 n=100+100)
Fixed/8192-2           1.04µs ± 0%    1.28µs ± 0%  +23.20%  (p=0.000 n=87+75)
Fixed/102400-AVX2-2    2.33µs ± 0%    2.33µs ± 0%   +0.16%  (p=0.000 n=100+100)
Fixed/102400-SSE2-2    4.30µs ± 0%    4.30µs ± 0%   +0.08%  (p=0.000 n=99+100)
Fixed/102400-2         12.9µs ± 0%    16.2µs ± 0%  +25.83%  (p=0.000 n=96+100)

master vs seed=42

name                 old time/op    new time/op    delta
Fixed/0-2              3.19ns ± 0%    5.26ns ± 0%   +64.96%  (p=0.000 n=99+97)
Fixed/1-2              3.64ns ± 0%    5.14ns ± 6%   +41.52%  (p=0.000 n=100+100)
Fixed/2-2              3.43ns ± 0%    4.84ns ± 0%   +40.82%  (p=0.000 n=100+98)
Fixed/3-2              3.27ns ± 0%    4.78ns ± 2%   +46.20%  (p=0.000 n=100+100)
Fixed/4-2              3.88ns ± 0%    5.52ns ± 1%   +42.15%  (p=0.000 n=100+100)
Fixed/8-2              3.88ns ± 0%    5.52ns ± 1%   +42.14%  (p=0.000 n=99+100)
Fixed/9-2              3.74ns ± 0%    5.06ns ± 1%   +35.17%  (p=0.000 n=99+100)
Fixed/16-2             3.74ns ± 0%    5.06ns ± 1%   +35.13%  (p=0.000 n=97+100)
Fixed/17-2             6.27ns ± 0%    7.48ns ± 0%   +19.38%  (p=0.000 n=100+95)
Fixed/32-2             6.27ns ± 0%    7.48ns ± 0%   +19.36%  (p=0.000 n=100+95)
Fixed/33-2             9.28ns ± 0%   10.30ns ± 0%   +10.94%  (p=0.000 n=100+83)
Fixed/64-2             9.28ns ± 0%   10.33ns ± 1%   +11.29%  (p=0.000 n=100+100)
Fixed/65-2             11.1ns ± 1%    12.6ns ± 1%   +13.42%  (p=0.000 n=101+100)
Fixed/96-2             11.1ns ± 1%    12.6ns ± 0%   +13.11%  (p=0.000 n=101+85)
Fixed/97-2             13.2ns ± 0%    14.6ns ± 1%   +10.98%  (p=0.000 n=99+100)
Fixed/128-2            13.2ns ± 0%    14.7ns ± 0%   +11.02%  (p=0.000 n=98+100)
Fixed/129-2            15.3ns ± 0%    16.8ns ± 0%    +9.80%  (p=0.000 n=100+100)
Fixed/240-2            28.8ns ± 0%    31.8ns ± 0%   +10.27%  (p=0.000 n=100+100)
Fixed/241-AVX2-2       19.5ns ± 0%    45.0ns ± 0%  +130.77%  (p=0.000 n=100+98)
Fixed/241-SSE2-2       22.2ns ± 0%    47.7ns ± 0%  +114.86%  (p=0.000 n=94+78)
Fixed/241-2            44.6ns ± 1%    68.4ns ± 0%   +53.16%  (p=0.000 n=99+99)
Fixed/512-AVX2-2       24.0ns ± 0%    49.7ns ± 0%  +107.27%  (p=0.000 n=100+100)
Fixed/512-SSE2-2       31.3ns ± 0%    59.7ns ± 0%   +90.60%  (p=0.000 n=100+99)
Fixed/512-2            71.6ns ± 0%    97.7ns ± 0%   +36.41%  (p=0.000 n=99+85)
Fixed/1024-AVX2-2      35.1ns ± 0%    61.4ns ± 0%   +74.93%  (p=0.000 n=92+89)
Fixed/1024-SSE2-2      50.2ns ± 0%    83.6ns ± 0%   +66.44%  (p=0.000 n=100+100)
Fixed/1024-2            133ns ± 0%     159ns ± 0%   +19.55%  (p=0.000 n=100+100)
Fixed/8192-AVX2-2       190ns ± 0%     216ns ± 0%   +13.68%  (p=0.000 n=99+100)
Fixed/8192-SSE2-2       337ns ± 0%     384ns ± 0%   +13.95%  (p=0.000 n=100+99)
Fixed/8192-2           1.04µs ± 0%    1.30µs ± 0%   +25.22%  (p=0.000 n=87+79)
Fixed/102400-AVX2-2    2.33µs ± 0%    2.37µs ± 0%    +1.89%  (p=0.000 n=100+100)
Fixed/102400-SSE2-2    4.30µs ± 0%    4.37µs ± 0%    +1.79%  (p=0.000 n=99+100)
Fixed/102400-2         12.9µs ± 0%    16.2µs ± 0%   +26.00%  (p=0.000 n=96+100)
klauspost commented 3 years ago

@peroxyacyl I've redone the PR in #12 - duplicated the code and added it to the Hasher.

~There is no 128 bit hash support yet. Do you know where I can find the information to do hashSmall128 and hashMed128?~

zeebo commented 2 years ago

Seed support has been added and in v1.0.0 now.