Open fr0staman opened 9 months ago
example can be reduced to target-feature=+aes
It looks like this bench is only hashing char
which SHOULD be specialized both cases. (Ideally to identical instructions.) I'll take a look this.
This does not appear to happen on my intel i9. There must be something odd in the assembly for the Ryzen. If +aes is giving identical performance to native it is possible it's not picking up the sse2 instructions for some reason.
@fr0staman If you run rustc --print=target-cpus
what does it indicate the detected CPU target is?
This might be related https://github.com/rust-lang/rust/issues/80633
rustc --print=target-cpus
Available CPUs for this target:
native - Select the CPU of the current host (currently znver4).
alderlake
amdfam10
athlon
athlon-4
athlon-fx
athlon-mp
athlon-tbird
athlon-xp
athlon64
athlon64-sse3
atom
atom_sse4_2
atom_sse4_2_movbe
barcelona
bdver1
bdver2
bdver3
bdver4
bonnell
broadwell
btver1
btver2
c3
c3-2
cannonlake
cascadelake
cooperlake
core-avx-i
core-avx2
core2
core_2_duo_sse4_1
core_2_duo_ssse3
core_2nd_gen_avx
core_3rd_gen_avx
core_4th_gen_avx
core_4th_gen_avx_tsx
core_5th_gen_avx
core_5th_gen_avx_tsx
core_aes_pclmulqdq
core_i7_sse4_2
corei7
corei7-avx
emeraldrapids
generic
geode
goldmont
goldmont-plus
goldmont_plus
grandridge
graniterapids
graniterapids-d
graniterapids_d
haswell
i386
i486
i586
i686
icelake-client
icelake-server
icelake_client
icelake_server
ivybridge
k6
k6-2
k6-3
k8
k8-sse3
knl
knm
lakemont
meteorlake
mic_avx512
nehalem
nocona
opteron
opteron-sse3
penryn
pentium
pentium-m
pentium-mmx
pentium2
pentium3
pentium3m
pentium4
pentium4m
pentium_4
pentium_4_sse3
pentium_ii
pentium_iii
pentium_iii_no_xmm_regs
pentium_m
pentium_mmx
pentium_pro
pentiumpro
prescott
raptorlake
rocketlake
sandybridge
sapphirerapids
sierraforest
silvermont
skx
skylake
skylake-avx512
skylake_avx512
slm
tigerlake
tremont
westmere
winchip-c6
winchip2
x86-64 - This is the default target CPU for the current build target (currently x86_64-unknown-linux-gnu).
x86-64-v2
x86-64-v3
x86-64-v4
yonah
znver1
znver2
znver3
znver4
Also has regression
rustc --print=target-cpus
Available CPUs for this target: native - Select the CPU of the current host (currently znver1). alderlake amdfam10 athlon athlon-4 athlon-fx athlon-mp athlon-tbird athlon-xp athlon64 athlon64-sse3 atom atom_sse4_2 atom_sse4_2_movbe barcelona bdver1 bdver2 bdver3 bdver4 bonnell broadwell btver1 btver2 c3 c3-2 cannonlake cascadelake cooperlake core-avx-i core-avx2 core2 core_2_duo_sse4_1 core_2_duo_ssse3 core_2nd_gen_avx core_3rd_gen_avx core_4th_gen_avx core_4th_gen_avx_tsx core_5th_gen_avx core_5th_gen_avx_tsx core_aes_pclmulqdq core_i7_sse4_2 corei7 corei7-avx emeraldrapids generic geode goldmont goldmont-plus goldmont_plus grandridge graniterapids graniterapids-d graniterapids_d haswell i386 i486 i586 i686 icelake-client icelake-server icelake_client icelake_server ivybridge k6 k6-2 k6-3 k8 k8-sse3 knl knm lakemont meteorlake mic_avx512 nehalem nocona opteron opteron-sse3 penryn pentium pentium-m pentium-mmx pentium2 pentium3 pentium3m pentium4 pentium4m pentium_4 pentium_4_sse3 pentium_ii pentium_iii pentium_iii_no_xmm_regs pentium_m pentium_mmx pentium_pro pentiumpro prescott raptorlake rocketlake sandybridge sapphirerapids sierraforest silvermont skx skylake skylake-avx512 skylake_avx512 slm tigerlake tremont westmere winchip-c6 winchip2 x86-64 - This is the default target CPU for the current build target (currently x86_64-unknown-linux-gnu). x86-64-v2 x86-64-v3 x86-64-v4 yonah znver1 znver2 znver3 znver4
@tkaitchuck I actually think this issue might be relevant: https://internals.rust-lang.org/t/slower-code-with-c-target-cpu-native/17315/7
https://share.firefox.dev/3RWEHk5 without aes flag https://share.firefox.dev/48D3E9Y with aes flag
Aes feature is indeed detected
@fr0staman Can you check if this is fixed on the 0.9 prerelease branch
Certainly!
Unfortunately, nothing has changed:
fr0staman@kotobook:~/source/repos/rust/rust-ahash-target-native-performance-issue$ RUSTFLAGS='-C target-cpu=native' cargo bench
...
Compiling ahash v0.9.0 (https://github.com/tkaitchuck/aHash?branch=0.9-prerelease#af37d79e)
...
Finished bench [optimized] target(s) in 43.16s
Running unittests src/main.rs (target/release/deps/rust_ahash_target_native_performance_issue-a98c230d15dcf9ae)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running benches/issue.rs (target/release/deps/issue-a3d835f7ef64d9be)
Gnuplot not found, using plotters backend
Performance/ahash/(32, 128)
time: [37.539 µs 37.543 µs 37.546 µs]
change: [+97.437% +97.897% +98.305%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
Performance/ahash/(256, 1024)
time: [2.3726 ms 2.3733 ms 2.3740 ms]
change: [+156.12% +156.46% +156.76%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
Performance/ahash/(1024, 4096)
time: [38.066 ms 38.109 ms 38.153 ms]
change: [+154.20% +155.09% +155.95%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
So, ahash with
target-cpu=native
on my setup shows significant performance regression This may be a Rust/LLVM issue, but I'll create an issue here first.Repro: https://github.com/fr0staman/rust-ahash-target-native-performance-issue
My setup
Rust:
System:
Results
Standard target
target-cpu=native