Open day01 opened 3 months ago
Does this reproduce when setting codegen-units to 1 in Cargo.toml?
I already have set units to 1: https://github.com/hayden4r4/blackscholes-rust/blob/master/Cargo.toml#L14
benches are in release mode so with units 1 n lto - fat.
So results are with these configuration.
Visibility has an impact on many codegen-related decisions like codegen-unit partitioning and function instantiation, so it's not necessary surprising that you got different behavior.
It would be useful to have a minimal reproduction here, including assembly, that shows exactly which function mattered (I assume you have pub
functions in the module, make each of them private to test) and extract something minimal out of your project that reproduces the regression (just different assembly is good enough for a minimal reproduction, doesn't necessarily need benchmarks). Then we can figure out why exactly the different optimization decision was made and whether there's anything we can do to improve it.
@rustbot label E-needs-mcve C-bug T-compiler A-codegen
The visibility of this function is what matters: https://github.com/hayden4r4/blackscholes-rust/blob/59330ae06d31baea02fb4c5af18451e48c85da0f/src/lets_be_rational/black.rs#L65
I have no idea why. After some brief profiling with perf I don't see a difference in the fast and slow versions.
The usual hypothesis at this point is that somehow code alignment or some cache or branch predictor collision matters. The effect size is about right, but I don't know how to test that hypothesis.
I did this investigation on x86_64. The fact that this reproduces so exactly across architectures makes me doubt that this is a microarchitecture issue... But I'm not sure what else it could be.
@Noratrieb I tried repro it but i cannot :) so it is why i added whole solution with description how to enable perf degradation.
@saethlin thanks for repro on x86_64
and target direct function.
Additionally i checked it with inline always
on all and random from function in L65
.
I hope the blackscholes
solution is relative small with direct marked problem so I hope it can help as an example.
any idea what is it or how to fix it?
I've encountered an unexpected and significant performance degradation in my application when changing a module's visibility from private to public. This behavior seems counterintuitive and potentially indicates a compiler optimization issue or an unexpected interaction between module visibility and performance.
my code:
source: https://github.com/hayden4r4/blackscholes-rust/blob/master/src/lets_be_rational/mod.rs#L8 benches: https://github.com/hayden4r4/blackscholes-rust/blob/master/benches/black.rs
Current behavior
When the black module is made public, the overall application performance degrades by approximately 50%.
Expected
Changing a module's visibility should not have a significant impact on the application's overall performance. We would expect minimal to no performance change when modifying module visibility.
Environment
Rust version
:1.80.0 (051478957 2024-07-21)
Cargo version
:1.80.0 (376290515 2024-07-16)
os
:macos m1 max
Anyone may know where is the bug/problem/challange?