Open Jefffrey opened 5 months ago
Maybe this also helps a bit to investigate, if we run datafusion windows CI with
RUSTFLAGS: "-C debuginfo=0 -C opt-level=1 -C target-feature=+crt-static -C incremental=false -C codegen-units=256"
it helps to reduce timing from 190m to 60m, but it still way longer than before
It might help to look at the raw log and narrow down which steps specifically are taking a long time.
So the issue is maybe around the generation of debuginfo?
Just a quick update, narrowed down the testing:
Limit to running only tpcds_physical_q44 with backtrace feature:
cargo test -p datafusion --test tpcds_planning tpcds_physical_q44 --features backtrace
std
:
good (2023-10-29 nightly) = 70 seconds
bad (2023-10-30 nightly) = 692 seconds
Without backtrace:
cargo test -p datafusion --test tpcds_planning tpcds_physical_q44
good (2023-10-29 nightly) = 1.25 seconds
bad (2023-10-30 nightly) = 1.26 seconds
Edit:
Also just to be clear, we are running with env vars:
For both good & bad runs.
Maybe this also helps a bit to investigate, if we run datafusion windows CI with
RUSTFLAGS: "-C debuginfo=0 -C opt-level=1 -C target-feature=+crt-static -C incremental=false -C codegen-units=256"
it helps to reduce timing from 190m to 60m, but it still way longer than before
The speedup seems to occur due to adding -C opt-level=1 -C target-feature=+crt-static -C incremental=false -C codegen-units=256
flags as the debuginfo flag was already set previously
https://github.com/rust-lang/rust/pull/117089 was merged in the right window, and the PR description notes that it contains https://github.com/rust-lang/backtrace-rs/pull/569 which sounds relevant.
cc @wesleywiser who authored https://github.com/rust-lang/rust/pull/117089 and changes to the backtrace-rs submodule (IIUC)
I'm able to repro locally:
> $env:RUSTFLAGS="-Cdebuginfo=0"
> cargo +nightly-2023-10-29 test -p datafusion --test tpcds_planning tpcds_physical_q44 --features backtrace
...
test tpcds_physical_q44 ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 197 filtered out; finished in 36.23s
> cargo +nightly-2023-10-30 test -p datafusion --test tpcds_planning tpcds_physical_q44 --features backtrace
...
running 1 test
test tpcds_physical_q44 has been running for over 60 seconds
Looking at profiling data, the vast majority of time in the regressed case is spent in dbghelp!SymAddrIncludeInlineTrace
which is a necessary call I introduced in rust-lang/backtrace-rs#569 when symbolizing the backtrace. I'm investigating if there is some other way to avoid calling that function but haven't had any luck.
One thing that's very interesting is that if do not set RUSTFLAGS="-Cdebuginfo=0"
, the regression disappears. This suggests to me we're hitting a pathological issue in dbghelp
when there isn't any debug info present. On Windows, debug info is basically required to get any kind of meaningful backtrace so setting both -Cdebuginfo=0
and --feature backtrace
in the test case doesn't make much sense to me.
@Jefffrey do you know why -Cdebuginfo=0
is being set? My assumption is that the intent is to make builds faster while still being able to test --feature backtrace
. If that's the case, I would suggest actually setting -Cdebuginfo=line-tables-only
which generates the minimal amount of information to give proper backtraces and doesn't have much impact on compile times (especially in comparison to -Cdebuginfo=2
which is the default in this case):
> $env:RUSTFLAGS="-Cdebuginfo=0"
> cargo +nightly-2023-10-30 test -p datafusion --test tpcds_planning tpcds_physical_q44 --features backtrace
...
Finished test [unoptimized + debuginfo] target(s) in 1m 25s
> $env:RUSTFLAGS="-Cdebuginfo=line-tables-only"
> cargo +nightly-2023-10-30 test -p datafusion --test tpcds_planning tpcds_physical_q44 --features backtrace
...
Finished test [unoptimized + debuginfo] target(s) in 1m 32s
> $env:RUSTFLAGS="-Cdebuginfo=2"
> cargo +nightly-2023-10-30 test -p datafusion --test tpcds_planning tpcds_physical_q44 --features backtrace
...
Finished test [unoptimized + debuginfo] target(s) in 2m 26s
Thank you for the detailed breakdown! @wesleywiser
@Jefffrey do you know why
-Cdebuginfo=0
is being set? My assumption is that the intent is to make builds faster while still being able to test--feature backtrace
. If that's the case, I would suggest actually setting-Cdebuginfo=line-tables-only
which generates the minimal amount of information to give proper backtraces and doesn't have much impact on compile times (especially in comparison to-Cdebuginfo=2
which is the default in this case):
The reasoning you provide makes sense, I'm not sure why both debuginfo=0
was set along with backtrace (probably had debuginfo=0
set first then backtrace feature was added as a separate feature so needed to be added to test).
do not produce debug symbols to keep memory usage down
for the debuginfo=0
setting fwiw, so I guess that's the main reasonI'll experiment with setting debuginfo=line-tables-only
and will see if that helps speed things back up :+1:
Thanks for the config fix suggested @wesleywiser
This does eliminate the performance regression (or at least bring the times within a more reasonable range of each other).
Even though it was caused by some weird settings on our end (disabling debuginfo but still using backtrace, on Windows), it was still a significant regression in performance. Let me know if this is the expected behaviour now and I can close this issue, or to keep this open if you're planning to investigate this weird edge case further :+1: (or move it to backtrace repo?)
This was discussed in the libs meeting. While this is an edge case and a likely bug in dbghlp, it was suggested that maybe backtrace could workaround it by somehow detecting if there is no debug info present? Even if it can only tell after trying, it could perhaps cache this knowledge so it doesn't try again the next time?
Code
DataFusion issue with complete details: https://github.com/apache/arrow-datafusion/issues/8696
arrow-datafusion runs
cargo test
on Windows runner as part of CI. When on Rust version 1.74.1 (and below), the check takes under 30 minutes. After upgrading to Rust version 1.75.0, it now takes over 3 hours, with no other change in code on our side. This seems to only take effect on Windows, as Linux/Mac tests didn't seem to be affected.After debugging, I found the regression occurs between toolchains
nightly-2023-10-29 (rust e5cfc5547)
andnightly-2023-10-30 (rust 608e9682f)
.We are running on GitHub actions runner
windows-latest
.So expected run is here, on toolchain
nightly-2023-10-29
: https://github.com/apache/arrow-datafusion/actions/runs/7394674719/job/20116418078When I bump to toolchain
nightly-2023-10-30
, with no other code changes: https://github.com/apache/arrow-datafusion/actions/runs/7394848426/job/20116910586Slow tests
The slowness occurs primarily in two tests.
tpcds_planning
Test code: https://github.com/apache/arrow-datafusion/blob/1179a76567892b259c88f08243ee01f05c4c3d5c/datafusion/core/tests/tpcds_planning.rs
On the good run (before regression):
On the bad run (after regression):
sqllogictest
Test code: https://github.com/apache/arrow-datafusion/tree/1179a76567892b259c88f08243ee01f05c4c3d5c/datafusion/sqllogictest
On the good run (before regression):
On the bad run (after regression):
These two tests are the only ones with a significant delta, the rest don't seem affected by the upgrade.
Version it worked on
Ran fast with Rust 1.74.1 (and nightly-2023-10-29)
Version with regression
Ran slow on Rust 1.75.0 (and nightly-2023-10-30)
Additional context
Apologies if the example is too large to easily determine where the issue is. I'll try to reduce this to a smaller MRE, as I don't have a Windows machine to locally test on, so have had to check via CI.