tkaitchuck / aHash

aHash is a non-cryptographic hashing algorithm that uses the AES hardware instruction
https://crates.io/crates/ahash
Apache License 2.0
1.03k stars 101 forks source link

Strong performance regression with target-cpu=native #190

Open fr0staman opened 9 months ago

fr0staman commented 9 months ago

So, ahash with target-cpu=native on my setup shows significant performance regression This may be a Rust/LLVM issue, but I'll create an issue here first.

Repro: https://github.com/fr0staman/rust-ahash-target-native-performance-issue

My setup

Rust:

rustc 1.74.1 (a28077b28 2023-12-04)
binary: rustc
commit-hash: a28077b28a02b92985b3a3faecf92813155f1ea1
commit-date: 2023-12-04
host: x86_64-unknown-linux-gnu
release: 1.74.1
LLVM version: 17.0.4

System:

CPU: AMD Ryzen 5 4500U
OS: Ubuntu 22.04.3 LTS

Results

Standard target

fr0staman@kotobook:~/source/repos/rust/rust-ahash-target-native-performance-issue$ cargo bench
    Finished bench [optimized] target(s) in 36.18s
     Running unittests src/main.rs (target/release/deps/rust_ahash_target_native_performance_issue-4df22a78d1110619)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/issue.rs (target/release/deps/ahash-3b7ae86a7bc7bacb)
Gnuplot not found, using plotters backend
Performance/ahash/(32, 128)
                        time:   [21.672 µs 21.698 µs 21.727 µs]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
Performance/ahash/(256, 1024)
                        time:   [983.01 µs 983.94 µs 984.92 µs]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
Performance/ahash/(1024, 4096)
                        time:   [15.256 ms 15.298 ms 15.341 ms]

target-cpu=native

fr0staman@kotobook:~/source/repos/rust/rust-ahash-target-native-performance-issue$ RUSTFLAGS='-C target-cpu=native' cargo bench
    Finished bench [optimized] target(s) in 46.42s
     Running unittests src/main.rs (target/release/deps/rust_ahash_target_native_performance_issue-4df22a78d1110619)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/issue.rs (target/release/deps/ahash-3b7ae86a7bc7bacb)
Gnuplot not found, using plotters backend
Performance/ahash/(32, 128)
                        time:   [37.734 µs 37.761 µs 37.789 µs]
                        change: [+73.336% +73.657% +73.980%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high severe
Performance/ahash/(256, 1024)
                        time:   [2.4681 ms 2.4698 ms 2.4717 ms]
                        change: [+150.51% +150.90% +151.29%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Performance/ahash/(1024, 4096)
                        time:   [38.308 ms 38.369 ms 38.433 ms]
                        change: [+149.98% +150.82% +151.60%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
puuuuh commented 9 months ago

example can be reduced to target-feature=+aes

tkaitchuck commented 9 months ago

It looks like this bench is only hashing char which SHOULD be specialized both cases. (Ideally to identical instructions.) I'll take a look this.

tkaitchuck commented 9 months ago

This does not appear to happen on my intel i9. There must be something odd in the assembly for the Ryzen. If +aes is giving identical performance to native it is possible it's not picking up the sse2 instructions for some reason.

tkaitchuck commented 9 months ago

@fr0staman If you run rustc --print=target-cpus what does it indicate the detected CPU target is?

tkaitchuck commented 9 months ago

This might be related https://github.com/rust-lang/rust/issues/80633

0xdeafbeef commented 9 months ago

rustc --print=target-cpus

Available CPUs for this target:
    native                  - Select the CPU of the current host (currently znver4).
    alderlake
    amdfam10
    athlon
    athlon-4
    athlon-fx
    athlon-mp
    athlon-tbird
    athlon-xp
    athlon64
    athlon64-sse3
    atom
    atom_sse4_2
    atom_sse4_2_movbe
    barcelona
    bdver1
    bdver2
    bdver3
    bdver4
    bonnell
    broadwell
    btver1
    btver2
    c3
    c3-2
    cannonlake
    cascadelake
    cooperlake
    core-avx-i
    core-avx2
    core2
    core_2_duo_sse4_1
    core_2_duo_ssse3
    core_2nd_gen_avx
    core_3rd_gen_avx
    core_4th_gen_avx
    core_4th_gen_avx_tsx
    core_5th_gen_avx
    core_5th_gen_avx_tsx
    core_aes_pclmulqdq
    core_i7_sse4_2
    corei7
    corei7-avx
    emeraldrapids
    generic
    geode
    goldmont
    goldmont-plus
    goldmont_plus
    grandridge
    graniterapids
    graniterapids-d
    graniterapids_d
    haswell
    i386
    i486
    i586
    i686
    icelake-client
    icelake-server
    icelake_client
    icelake_server
    ivybridge
    k6
    k6-2
    k6-3
    k8
    k8-sse3
    knl
    knm
    lakemont
    meteorlake
    mic_avx512
    nehalem
    nocona
    opteron
    opteron-sse3
    penryn
    pentium
    pentium-m
    pentium-mmx
    pentium2
    pentium3
    pentium3m
    pentium4
    pentium4m
    pentium_4
    pentium_4_sse3
    pentium_ii
    pentium_iii
    pentium_iii_no_xmm_regs
    pentium_m
    pentium_mmx
    pentium_pro
    pentiumpro
    prescott
    raptorlake
    rocketlake
    sandybridge
    sapphirerapids
    sierraforest
    silvermont
    skx
    skylake
    skylake-avx512
    skylake_avx512
    slm
    tigerlake
    tremont
    westmere
    winchip-c6
    winchip2
    x86-64                  - This is the default target CPU for the current build target (currently x86_64-unknown-linux-gnu).
    x86-64-v2
    x86-64-v3
    x86-64-v4
    yonah
    znver1
    znver2
    znver3
    znver4
Also has regression
fr0staman commented 9 months ago

rustc --print=target-cpus


Available CPUs for this target:
native                  - Select the CPU of the current host (currently znver1).
alderlake
amdfam10
athlon
athlon-4
athlon-fx
athlon-mp
athlon-tbird
athlon-xp
athlon64
athlon64-sse3
atom
atom_sse4_2
atom_sse4_2_movbe
barcelona
bdver1
bdver2
bdver3
bdver4
bonnell
broadwell
btver1
btver2
c3
c3-2
cannonlake
cascadelake
cooperlake
core-avx-i
core-avx2
core2
core_2_duo_sse4_1
core_2_duo_ssse3
core_2nd_gen_avx
core_3rd_gen_avx
core_4th_gen_avx
core_4th_gen_avx_tsx
core_5th_gen_avx
core_5th_gen_avx_tsx
core_aes_pclmulqdq
core_i7_sse4_2
corei7
corei7-avx
emeraldrapids
generic
geode
goldmont
goldmont-plus
goldmont_plus
grandridge
graniterapids
graniterapids-d
graniterapids_d
haswell
i386
i486
i586
i686
icelake-client
icelake-server
icelake_client
icelake_server
ivybridge
k6
k6-2
k6-3
k8
k8-sse3
knl
knm
lakemont
meteorlake
mic_avx512
nehalem
nocona
opteron
opteron-sse3
penryn
pentium
pentium-m
pentium-mmx
pentium2
pentium3
pentium3m
pentium4
pentium4m
pentium_4
pentium_4_sse3
pentium_ii
pentium_iii
pentium_iii_no_xmm_regs
pentium_m
pentium_mmx
pentium_pro
pentiumpro
prescott
raptorlake
rocketlake
sandybridge
sapphirerapids
sierraforest
silvermont
skx
skylake
skylake-avx512
skylake_avx512
slm
tigerlake
tremont
westmere
winchip-c6
winchip2
x86-64                  - This is the default target CPU for the current build target (currently x86_64-unknown-linux-gnu).
x86-64-v2
x86-64-v3
x86-64-v4
yonah
znver1
znver2
znver3
znver4
Pzixel commented 9 months ago

@tkaitchuck I actually think this issue might be relevant: https://internals.rust-lang.org/t/slower-code-with-c-target-cpu-native/17315/7

0xdeafbeef commented 9 months ago

https://share.firefox.dev/3RWEHk5 without aes flag https://share.firefox.dev/48D3E9Y with aes flag

image image

Aes feature is indeed detected

tkaitchuck commented 6 months ago

@fr0staman Can you check if this is fixed on the 0.9 prerelease branch

fr0staman commented 6 months ago

Certainly!

Unfortunately, nothing has changed:

fr0staman@kotobook:~/source/repos/rust/rust-ahash-target-native-performance-issue$ RUSTFLAGS='-C target-cpu=native' cargo bench
   ...
   Compiling ahash v0.9.0 (https://github.com/tkaitchuck/aHash?branch=0.9-prerelease#af37d79e)
   ...
    Finished bench [optimized] target(s) in 43.16s
     Running unittests src/main.rs (target/release/deps/rust_ahash_target_native_performance_issue-a98c230d15dcf9ae)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/issue.rs (target/release/deps/issue-a3d835f7ef64d9be)
Gnuplot not found, using plotters backend
Performance/ahash/(32, 128)
                        time:   [37.539 µs 37.543 µs 37.546 µs]
                        change: [+97.437% +97.897% +98.305%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Performance/ahash/(256, 1024)
                        time:   [2.3726 ms 2.3733 ms 2.3740 ms]
                        change: [+156.12% +156.46% +156.76%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Performance/ahash/(1024, 4096)
                        time:   [38.066 ms 38.109 ms 38.153 ms]
                        change: [+154.20% +155.09% +155.95%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild