rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.14k stars 12.69k forks source link

Miscompilation on ARM-M with nightly-2021-04-23 #85351

Closed cbiffle closed 3 years ago

cbiffle commented 3 years ago

We are seeing a subtle occasional miscompilation on ARM-M using nightly-2021-04-23 in rust-toolchain. It is difficult to elicit and reproduce, since subtle changes to the layout of the code will cause the compiler to make decisions that either do or do not trigger the bug. It appears to have something to do with stack frame maintenance in outlined functions. We are definitely observing it on thumbv8m.main-none-eabihf, but it's subtle enough that we may also be getting it on thumbv7em-none-eabihf and just haven't noticed it yet.

As of somewhat recently (late April?) output at opt-level = "z" has started including outlined functions that look like this (actual example):

000211e6 <OUTLINED_FUNCTION_2>:
   211e6:   f84d ed08   str.w   lr, [sp, #-8]!
   211ea:   e9cd 5007   strd    r5, r0, [sp, #28]
   211ee:   4620        mov r0, r4
   211f0:   f8ad 6034   strh.w  r6, [sp, #52]   ; 0x34
   211f4:   e9cd 1209   strd    r1, r2, [sp, #36]   ; 0x24
   211f8:   f7fe ff33   bl  20062 <_ZN7userlib2hl7Message5fixed17h5f2a9abf3d25035aE>
   211fc:   2800        cmp r0, #0
   211fe:   f85d eb08   ldr.w   lr, [sp], #8
   21202:   4770        bx  lr

Now, note that the instructions at 0x211e6 and 0x211fe are setting up and tearing down a temporary stack frame, respectively. This will become important in a bit.

It appears that the stack frame offsets used in instructions while this temporary stack frame exists are not being updated to reflect its existence. Stack variables updated within the outlined function above are being deposited 8 bytes off where they should be.

I do not currently have a compact repro case, and the code in question has not yet been published (though I could arrange to publish it if it would help, we intend to open source it). Here are two execution traces of programs showing correct behavior vs corrupt behavior. Both traces set up arguments to a syscall, which uses struct return and deposits a struct onto the stack; the routines then shuffle the results around before calling a library function. It is during the shuffling that things go awry.

In this working trace I have called the struct return buffer in the stack frame R and another related-but-separate buffer B. I've omitted instructions that don't contribute by control flow or value-dominating the registers at the end. S refers to the value of the stack pointer on entry to the trace.

   2006e:   4606        mov r6, r0  ; r6 = B
   20070:   a801        add r0, sp, #4  ; r0 = R = S + 4
   20072:   460d        mov r5, r1
   20074:   9000        str r0, [sp, #0]
   20076:   4630        mov r0, r6
   20078:   2102        movs    r1, #2
   2007a:   2200        movs    r2, #0
   2007c:   2300        movs    r3, #0
   2007e:   f001 f8bc   bl  211fa <sys_recv_stub>
   20082:   2800        cmp r0, #0
    ...
   2009c:   9803        ldr r0, [sp, #12] ; r0 = [S + 12] = [R + 8]
    ...
   200a2:   e9dd 1204   ldrd    r1, r2, [sp, #16]
    ; r1 = [S + 16] = [R + 12]
    ; r2 = [S + 20] = [R + 16]
    ...
   200b0:   f001 f89c   bl  211ec <OUTLINED_FUNCTION_1>
    ; (following call)
    ...
   211f0:   e9cd 1203   strd    r1, r2, [sp, #12]
    ; [S + 12] = r1 = [R + 12]
    ; [S + 16] = r2 = [R + 16]
   211f4:   e9cd 6001   strd    r6, r0, [sp, #4]
    ; [S + 4] = r6 = B
    ; [S + 8] = [R + 8]
   211f8:   4770        bx  lr
    ; (returns)
   200b4:   a801        add r0, sp, #4  ; r0 = S + 4 = R
   200b6:   f000 f875   bl  201a4 <_ZN7userlib2hl7Message5fixed17h5f2a9abf3d25035aE>
    ; function invoked with r0 = R = S + 4
    ; words [S+4], [S+8], [S+12], [S+16] initialized
    ; everything's good

Now, here is the non-working trace with the same sort of annotations. Note that while the function at the end is still called with one argument R (stack frame plus 28), the actual struct being passed is deposited starting 8 bytes lower at stack frame plus 20:

   200e4:   ac07        add r4, sp, #28 ; r4 = S + 28 = R
    ...
   20106:   ad06        add r5, sp, #24 ; r5 = S + 24 = B
    ...
   2010a:   4628        mov r0, r5  ; r0 = S + 24
   2010c:   2102        movs    r1, #2
   2010e:   2200        movs    r2, #0
   20110:   2300        movs    r3, #0
   20112:   9400        str r4, [sp, #0]  ; stack arg = r4 = S + 28 = R
   20114:   f001 f876   bl  21204 <sys_recv_stub>
   20118:   2800        cmp r0, #0
    ...
   20136:   e9dd 120a   ldrd    r1, r2, [sp, #40]
    ; r1 = [S + 40] = [R + 12]
    ; r2 = [S + 44] = [R + 16]
    ...
   20144:   f001 f84f   bl  211e6 <OUTLINED_FUNCTION_2>
    ; (following call)
   211e6:   f84d ed08   str.w   lr, [sp, #-8]!  ; sp = S - 8 <---- stack frame adjust
   211ea:   e9cd 5007   strd    r5, r0, [sp, #28]
    ; [S + 20] = B
    ; [S + 24] = r0 = (known to be zero from CFG, omitted)
   211ee:   4620        mov r0, r4  ; r0 = r4 = R, set above, before call
    ...
   211f4:   e9cd 1209   strd    r1, r2, [sp, #36]
    ; [S + 28] = r1 = [R + 12]
    ; [S + 32] = r2 = [R + 16]
   211f8:   f7fe ff33   bl  20062 <_ZN7userlib2hl7Message5fixed17h5f2a9abf3d25035aE>
    ; function invoked with r0 = S + 28
    ; actual struct written at: S+20.

Additional notes:

Meta

rustc --version --verbose:

rustc 1.53.0-nightly (7f4afdf02 2021-04-22)
binary: rustc
commit-hash: 7f4afdf0255600306bf67432da722c7b5d2cbf82
commit-date: 2021-04-22
host: x86_64-unknown-linux-gnu
release: 1.53.0-nightly
LLVM version: 12.0.0
azerupi commented 3 years ago

We are having problems with Rust 1.52 on thumbv6m-none-eabi while there were no problems on 1.51. This does indeed coincide with the LLVM 12 upgrade. I'm not sure if it's the same problem, but if it is we can add the following information:

We are usually compiling with

[profile.release]
codegen-units = 1
lto = true
debug = false
opt-level = "s"

But we tried without specifying the opt-level and it didn't make any difference. The problem is dependent on the code, we have another branch that behaves just fine.

Unfortunately I can't share the code to reproduce.

cbiffle commented 3 years ago

@luqmana suggested adding -C llvm-args=--enable-machine-outliner=never to RUSTFLAGS and that fixes our software.

CC: @yroux

yroux commented 3 years ago

Thanks for the heads-up and analysis.

I confirm that the issue was introduced with LLVM-12 due to the last developments made on the Machine Outliner. Notice that it is only enabled under -Oz optimization level for 32-bit ARM M-profile and AArch64 targets, unless the --enable-machine-outliner flag is used. So, other targets should be fine and the suggested flag to disable machine outlining is a proper workaround.

To give you a bit more context here, Machine Outlining is a code size optimization which, in a nutshell, is the reverse of inlining (it replaces repeated sequences of instructions by function calls). In our case here, since the extracted peace of code contains a call, the link register needs to be saved on the entry of the block and restored to be able jump back to the call site, the offsets of the instruction which are using the stack are changed accordingly to reflect thios change, but it doesn't take into account that a stack pointer was saved into a register (r4) and used here as well.

I'll fix the issue in LLVM and let you know the status.

cbiffle commented 3 years ago

In our case here, since the extracted peace of code contains a call, the link register needs to be saved on the entry of the block and restored to be able jump back to the call site, the offsets of the instruction which are using the stack are changed accordingly to reflect thios change, but it doesn't take into account that a stack pointer was saved into a register (r4) and used here as well.

@yroux, I'm not sure this is correct. The stack pointer relative address saved in r4 is the address of the struct allocated in the stack frame at S + 28 (instruction at 200e4 in the second trace). The instructions starting at 211ea are filling that struct, but they are doing so using immediate 28 offsets, which would be the correct instructions in the absence of outlining, but which target the wrong address post-outlining. This suggests to me that they were not patched by the outliner.

You know this algorithm better than I do, of course, so let me know if I've missed something.

yroux commented 3 years ago

@cbiffle no sorry, you are right, I read it too quickly and I shouldn't work that late ;-) I'll look at the patching logic tomorrow

apiraino commented 3 years ago

Assigning priority as discussed in the Zulip thread of the Prioritization Working Group.

@rustbot label -I-prioritize +P-critical +T-compiler

pnkfelix commented 3 years ago

@triagebot ping llvm

rustbot commented 3 years ago

Hey LLVM ICE-breakers! This bug has been identified as a good "LLVM ICE-breaking candidate". In case it's useful, here are some instructions for tackling these sorts of bugs. Maybe take a look? Thanks! <3

cc @camelid @comex @cuviper @DutchGhost @hdhoang @henryboisdequin @heyrutvik @higuoxing @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique

Mark-Simulacrum commented 3 years ago

@rustbot ping llvm

rustbot commented 3 years ago

Hey LLVM ICE-breakers! This bug has been identified as a good "LLVM ICE-breaking candidate". In case it's useful, here are some instructions for tackling these sorts of bugs. Maybe take a look? Thanks! <3

cc @camelid @comex @cuviper @DutchGhost @hdhoang @henryboisdequin @heyrutvik @higuoxing @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique

nagisa commented 3 years ago

A couple points: IIRC outliner is a pretty recent addition to LLVM. Also, to work on this it is going to be important to have some code that reproduces this (I'm not seeing any, please tell me if I missed anything)

yroux commented 3 years ago

Machine Outliner initial support for ARM was added into LLVM-11, but it was improved to handled more cases (such as ld/st stack instructions involved in this issue) and enabled into -Oz for M-profile targets in LLVM-12 release.

I managed to reproduce the issue on a reduce LLVM MIR test, and I'm working on a fix.

yroux commented 3 years ago

Issue reported into llvm bugzilla: https://bugs.llvm.org/show_bug.cgi?id=50481 Fix submitted: https://reviews.llvm.org/D103167

I hope to have it part of llvm 12.0.1 which is currently in RC1 state

yroux commented 3 years ago

Fix commited into mainline as: https://reviews.llvm.org/rG6c78dbd4ca1f

luqmana commented 3 years ago

Thanks @yroux! Will you be driving getting it into 12.0.1? I believe the deadline for getting fixes in is Friday.

EDIT: Ah, I see @nikic marked it as a release-12.0.1 blocker on the bug. Thanks!