Closed cbiffle closed 3 years ago
We are having problems with Rust 1.52 on thumbv6m-none-eabi
while there were no problems on 1.51. This does indeed coincide with the LLVM 12 upgrade. I'm not sure if it's the same problem, but if it is we can add the following information:
We are usually compiling with
[profile.release]
codegen-units = 1
lto = true
debug = false
opt-level = "s"
But we tried without specifying the opt-level and it didn't make any difference. The problem is dependent on the code, we have another branch that behaves just fine.
Unfortunately I can't share the code to reproduce.
@luqmana suggested adding -C llvm-args=--enable-machine-outliner=never
to RUSTFLAGS
and that fixes our software.
CC: @yroux
Thanks for the heads-up and analysis.
I confirm that the issue was introduced with LLVM-12 due to the last developments made on the Machine Outliner. Notice that it is only enabled under -Oz optimization level for 32-bit ARM M-profile and AArch64 targets, unless the --enable-machine-outliner
flag is used. So, other targets should be fine and the suggested flag to disable machine outlining is a proper workaround.
To give you a bit more context here, Machine Outlining is a code size optimization which, in a nutshell, is the reverse of inlining (it replaces repeated sequences of instructions by function calls). In our case here, since the extracted peace of code contains a call, the link register needs to be saved on the entry of the block and restored to be able jump back to the call site, the offsets of the instruction which are using the stack are changed accordingly to reflect thios change, but it doesn't take into account that a stack pointer was saved into a register (r4
) and used here as well.
I'll fix the issue in LLVM and let you know the status.
In our case here, since the extracted peace of code contains a call, the link register needs to be saved on the entry of the block and restored to be able jump back to the call site, the offsets of the instruction which are using the stack are changed accordingly to reflect thios change, but it doesn't take into account that a stack pointer was saved into a register (
r4
) and used here as well.
@yroux, I'm not sure this is correct. The stack pointer relative address saved in r4
is the address of the struct allocated in the stack frame at S + 28
(instruction at 200e4
in the second trace). The instructions starting at 211ea
are filling that struct, but they are doing so using immediate 28
offsets, which would be the correct instructions in the absence of outlining, but which target the wrong address post-outlining. This suggests to me that they were not patched by the outliner.
You know this algorithm better than I do, of course, so let me know if I've missed something.
@cbiffle no sorry, you are right, I read it too quickly and I shouldn't work that late ;-) I'll look at the patching logic tomorrow
Assigning priority as discussed in the Zulip thread of the Prioritization Working Group.
@rustbot label -I-prioritize +P-critical +T-compiler
@triagebot ping llvm
Hey LLVM ICE-breakers! This bug has been identified as a good "LLVM ICE-breaking candidate". In case it's useful, here are some instructions for tackling these sorts of bugs. Maybe take a look? Thanks! <3
cc @camelid @comex @cuviper @DutchGhost @hdhoang @henryboisdequin @heyrutvik @higuoxing @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique
@rustbot ping llvm
Hey LLVM ICE-breakers! This bug has been identified as a good "LLVM ICE-breaking candidate". In case it's useful, here are some instructions for tackling these sorts of bugs. Maybe take a look? Thanks! <3
cc @camelid @comex @cuviper @DutchGhost @hdhoang @henryboisdequin @heyrutvik @higuoxing @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique
A couple points: IIRC outliner is a pretty recent addition to LLVM. Also, to work on this it is going to be important to have some code that reproduces this (I'm not seeing any, please tell me if I missed anything)
Machine Outliner initial support for ARM was added into LLVM-11, but it was improved to handled more cases (such as ld/st stack instructions involved in this issue) and enabled into -Oz for M-profile targets in LLVM-12 release.
I managed to reproduce the issue on a reduce LLVM MIR test, and I'm working on a fix.
Issue reported into llvm bugzilla: https://bugs.llvm.org/show_bug.cgi?id=50481 Fix submitted: https://reviews.llvm.org/D103167
I hope to have it part of llvm 12.0.1 which is currently in RC1 state
Fix commited into mainline as: https://reviews.llvm.org/rG6c78dbd4ca1f
Thanks @yroux! Will you be driving getting it into 12.0.1? I believe the deadline for getting fixes in is Friday.
EDIT: Ah, I see @nikic marked it as a release-12.0.1 blocker on the bug. Thanks!
We are seeing a subtle occasional miscompilation on ARM-M using
nightly-2021-04-23
inrust-toolchain
. It is difficult to elicit and reproduce, since subtle changes to the layout of the code will cause the compiler to make decisions that either do or do not trigger the bug. It appears to have something to do with stack frame maintenance in outlined functions. We are definitely observing it onthumbv8m.main-none-eabihf
, but it's subtle enough that we may also be getting it onthumbv7em-none-eabihf
and just haven't noticed it yet.As of somewhat recently (late April?) output at
opt-level = "z"
has started including outlined functions that look like this (actual example):Now, note that the instructions at 0x211e6 and 0x211fe are setting up and tearing down a temporary stack frame, respectively. This will become important in a bit.
It appears that the stack frame offsets used in instructions while this temporary stack frame exists are not being updated to reflect its existence. Stack variables updated within the outlined function above are being deposited 8 bytes off where they should be.
I do not currently have a compact repro case, and the code in question has not yet been published (though I could arrange to publish it if it would help, we intend to open source it). Here are two execution traces of programs showing correct behavior vs corrupt behavior. Both traces set up arguments to a syscall, which uses struct return and deposits a struct onto the stack; the routines then shuffle the results around before calling a library function. It is during the shuffling that things go awry.
In this working trace I have called the struct return buffer in the stack frame R and another related-but-separate buffer B. I've omitted instructions that don't contribute by control flow or value-dominating the registers at the end. S refers to the value of the stack pointer on entry to the trace.
Now, here is the non-working trace with the same sort of annotations. Note that while the function at the end is still called with one argument
R
(stack frame plus 28), the actual struct being passed is deposited starting 8 bytes lower at stack frame plus 20:Additional notes:
rust-toolchain
fromnightly-2020-12-29
tonightly-2021-04-23
, so the behavior was introduced somewhere between those points. (@luqmana points out that this likely includes the LLVM 11-12 transition.)opt-level = "z"
but this may or may not be specific to that opt level.Meta
rustc --version --verbose
: