Open benma opened 3 years ago
Bisected this regression to nightly-2020-10-17.
@rustbot label: +I-heavy +regression-from-stable-to-beta
@rustbot label: -regression-from-stable-to-beta +regression-from-stable-to-stable
Bisected this regression to nightly-2020-10-17.
We have the nighty where this regressed, can we try to go down a specific commit where this happened?
@rustbot ping icebreakers-cleanup-crew
Hey Cleanup Crew ICE-breakers! This bug has been identified as a good "Cleanup ICE-breaking candidate". In case it's useful, here are some instructions for tackling these sorts of bugs. Maybe take a look? Thanks! <3
cc @AminArria @camelid @chrissimpkins @contrun @DutchGhost @elshize @h-michael @HallerPatrick @hdhoang @hellow554 @henryboisdequin @imtsuki @JamesPatrickGill @kanru @KarlK90 @LeSeulArtichaut @MAdrianMattocks @matheus-consoli @mental32 @nmccarty @Noah-Kennedy @pard68 @PeytonT @pierreN @Redblueflame @RobbieClarken @RobertoSnap @robjtede @SarthakSingh31 @shekohex @sinato @smmalis37 @steffahn @Stupremee @tamuhey @turboladen @woshilapin @yerke
Assigning priority as discussed as part of the Prioritization Working Group procedure and removing I-prioritize
.
@rustbot label -I-prioritize +P-medium
@apiraino:
We have the nighty where this regressed, can we try to go down a specific commit where this happened?
Unfortunately the CI artifacts from back then are no longer available (to me, at least); so this would entail manually compiling those specific commits... compiling rustc is very slow for me, but I'll see what I can do.
~I should also add that I ran the bisection on the basis of treating compilations that took longer than 5 minutes as a regression; this appeared to correlate with memory usage exceeding 20GiB, although I did notice some builds completing within 5 minutes where memory usage exceeded 10GiB so this issue may be the combined effect of multiple commits and/or the bisection may not be entirely reliable.~ Sorry wasn't paying attention—this comment related to a different issue/bisection. Am pretty confident the identified nightly is the one in which the regression was introduced.
Command | Size (bytes) |
---|---|
cargo +nightly-2020-10-16 |
932 |
cargo +nightly-2020-10-17 |
11248 |
env RUSTFLAGS=-Zinsert-sideeffect cargo +nightly-2020-10-16 |
11248 |
cargo +nightly-2020-10-17 + unreachable_unchecked |
932 |
cargo +nightly-2020-10-17 -Zbuild-std=panic_abort,core |
1092 |
This broke due to the change merged at a78a62f which was intended to fix #28728. In particular, it adds a side effect call to empty loops, which would be triggered here. I'm not familiar enough with the behaviour of the compiler to say for certain, but I think the side effect inhibits the compiler from optimising away the associated loop / callers and as a result the symbols are left in the binary. I'm also not quite able to replicate the nightly build with my locally built compiler so there may be some other surprises here.
In case some more details would be of use:
Here's the LLVM IR I get from 7f58716 (i.e. the last good commit):
; ModuleID = 'example.bxq6xejs-cgu.0'
source_filename = "example.bxq6xejs-cgu.0"
target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "thumbv7em-none-unknown-eabi"
; Function Attrs: minsize norecurse noreturn nounwind optsize readnone
define void @_start() unnamed_addr #0 {
bb1:
br label %bb2
bb2: ; preds = %bb2, %bb1
br label %bb2
}
attributes #0 = { minsize norecurse noreturn nounwind optsize readnone "frame-pointer"="all" "target-cpu"="generic" }
Whereas here's what I get from e2efec89 (i.e. the commit that broke this):
; ModuleID = 'example.bxq6xejs-cgu.0'
source_filename = "example.bxq6xejs-cgu.0"
target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "thumbv7em-none-unknown-eabi"
; Function Attrs: minsize noreturn nounwind optsize
define void @_start() unnamed_addr #0 {
start:
%e.i = alloca {}, align 1
%0 = bitcast {}* %e.i to i8*
call void @llvm.lifetime.start.p0i8(i64 0, i8* nonnull %0)
; call core::result::unwrap_failed
call fastcc void @_ZN4core6result13unwrap_failed17h28e5bee20479b83aE({}* nonnull align 1 %e.i) #4
unreachable
}
; Function Attrs: argmemonly nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1
; Function Attrs: inaccessiblememonly nounwind willreturn
declare void @llvm.sideeffect() #2
; core::result::unwrap_failed
; Function Attrs: cold noinline noreturn nounwind
define internal fastcc void @_ZN4core6result13unwrap_failed17h28e5bee20479b83aE({}* nocapture nonnull readnone align 1 %0) unnamed_addr #3 {
; call core::panicking::panic_fmt
tail call fastcc void @_ZN4core9panicking9panic_fmt17hbd2067f8be3a4464E()
unreachable
}
; core::panicking::panic_fmt
; Function Attrs: cold noinline noreturn nounwind
define internal fastcc void @_ZN4core9panicking9panic_fmt17hbd2067f8be3a4464E() unnamed_addr #3 {
br label %bb1.i
bb1.i: ; preds = %bb1.i, %0
tail call void @llvm.sideeffect() #4
br label %bb1.i
}
attributes #0 = { minsize noreturn nounwind optsize "frame-pointer"="all" "target-cpu"="generic" }
attributes #1 = { argmemonly nounwind willreturn }
attributes #2 = { inaccessiblememonly nounwind willreturn }
attributes #3 = { cold noinline noreturn nounwind "frame-pointer"="all" "target-cpu"="generic" }
attributes #4 = { nounwind }
Here we see bb2
has become bb1.i
and calls a side effect function, stopping optimisation.
It looks like this is the same theory discussed in the Zulip thread.
For embedded developers finding this issue: if you're using a halt/abort panic handler anyway, building with -Zbuild-std=core -Zbuild-std-features=panic_immediate_abort
will allow Rust to optimize away the formatting code.
Needed toolchains and targets to run the examples (nightly instead of 1.49.0 also works, 1.49.0 is the first release with the regression):
Code
Compiled with:
I expected the all functions related to panics to be optimized away and the binary size to be relatively small:
Output with toolchain 1.48.0, everything is as expected:
The same output when compiled with 1.49.0:
The binary size blew up by ~10kb. For my embedded program, this increase is too large and for my target and I cannot run the program anymore.
When I inline the code from
unwrap()
and callpanic!()
manually, the binary size reduces against drastically, though the symbols list still has apanic_fmt
entry that is not optimized away since 1.49.0 (previous toolchains produce a minimal binary with no panic-related code in the binary, as expected):Version it worked on
Toolchains below 1.49.0
Version with regression
Toolchains since 1.49.0, including today's nightly.