llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.55k stars 11.8k forks source link

clang corrupts rsp when used as output constraint (Linux kernel >= 4.6 inline asm) #33260

Open m-gupta opened 7 years ago

m-gupta commented 7 years ago
Bugzilla Link 33913
Version trunk
OS Linux
CC @KernelAddress,@efriedma-quic,@ramosian-glider,@rengolin,@rnk,@ZviRackover

Extended Description

Reported by mka@chromium.org. chromium bug: https://bugs.chromium.org/p/chromium/issues/detail?id=737659

The following upstream kernel commit intends to forces a stack frame before inline assembly code if it doesn't already exist:

commit f05058c4d652b619adfda6c78d8f5b341169c264 Author: Chris J Arges chris.j.arges@canonical.com Date: Thu Jan 21 16:49:25 2016 -0600

x86/uaccess: Add stack frame output operand in get_user() inline asm

Numerous 'call without frame pointer save/setup' warnings are introduced
by stacktool because of functions using the get_user() macro. Bad stack
traces could occur due to lack of or misplacement of stack frame setup
code.

This patch forces a stack frame to be created before the inline asm code
if CONFIG_FRAME_POINTER is enabled by listing the stack pointer as an
output operand for the get_user() inline assembly statement.

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index a4a30e4b2d34..9bbb3b2d0372 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -179,10 +179,11 @@ typeof(builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL)) ({ \ int ret_gu; \ register inttype(*(ptr)) val_gu asm("%"_ASM_DX); \

This inline asm causes double fault when compiled with clang.

Analysis by Josh Poimboeuf:

Here's the reason for the double fault. First it puts zero on the stack at offset -0x58:

ffffffff81367616: 31 c0 xor %eax,%eax ffffffff81367618: 48 89 45 c8 mov %rax,-0x38(%rbp) ffffffff8136761c: 45 31 ff xor %r15d,%r15d ffffffff8136761f: 48 89 45 a8 mov %rax,-0x58(%rbp)

Then, later, it copies that zeroed word from the stack to RSP:

ffffffff81367874: 48 8b 65 a8 mov -0x58(%rbp),%rsp

Then it double faults because the call instruction tries to write RIP on the stack, but RSP is zero:

ffffffff81367878: e8 73 26 f1 ff callq ffffffff81279ef0 <__get_user_4>

Then clang tries to put RSP's value on the stack, at the same stack slot where the original zero was stored (though it never reaches this point):

ffffffff8136787d: 49 89 d4 mov %rdx,%r12 ffffffff81367880: 48 89 65 a8 mov %rsp,-0x58(%rbp)

The panic is consistent with the above. RIP points to the call instruction, RSP is zero:

[ 3.798722] PANIC: double fault, error_code: 0x0 [ 3.807387] CPU: 1 PID: 605 Comm: frecon Not tainted 4.12.0-00023-g711d82c128ff #​107 [ 3.816040] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016 [ 3.824792] task: ffff880075b92f00 task.stack: ffffc90000d6c000 [ 3.829599] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,data=ordered [ 3.839092] RIP: 0010:drm_mode_setcrtc+0x328/0x51f [ 3.844443] RSP: 0018:0000000000000000 EFLAGS: 00010206 [ 3.850280] RAX: 0000559e707c4d60 RBX: 0000000000000000 RCX: 0000000000000008 [ 3.858253] RDX: 0000000000000001 RSI: ffffc90000d6fcc8 RDI: ffffffff81367805 [ 3.866225] RBP: ffffc90000d6fd90 R08: 00000000014000c0 R09: 0000000000000308 [ 3.874199] R10: 0000000000000300 R11: 0000000000000556 R12: 0000000000000000 [ 3.882163] R13: ffff880077a25000 R14: ffffc90000d6fdd0 R15: 0000000000000000 [ 3.890136] FS: 00007fb2dbd62740(0000) GS:ffff88007ad00000(0000) knlGS:0000000000000000 [ 3.899177] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.905595] CR2: fffffffffffffff8 CR3: 000000007596b000 CR4: 00000000001006e0 [ 3.913568] Call Trace: [ 3.916296] Code: b8 48 8d b5 38 ff ff ff 41 8b 56 08 39 d3 0f 83 85 00 00 00 4c 63 fb 4a 83 24 f8 00 4a 8d 04 bd 00 00 00 00 49 03 06 48 8b 65 a8 73 26 f1 ff 49 89 d4 48 89 65 a8 85 c0 0f 85 a0 00 00 00 ba [ 3.937440] Kernel panic - not syncing: Machine halted. [ 3.943268] CPU: 1 PID: 605 Comm: frecon Not tainted 4.12.0-00023-g711d82c128ff #​107 [ 3.951921] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016 [ 3.960671] Call Trace: [ 3.963398] <#DF> [ 3.965637] __dump_stack+0x19/0x1b [ 3.969531] dump_stack+0x42/0x60

clang is obviously getting confused by the RSP output constraint. I think it tries to take the constraint literally, since it takes RSP as an output from the inline asm and stores it on the stack. However, that behavior doesn't really make sense for a "register" variable. It also doesn't explain why it's zeroing the register out first.

Link with the discussion: https://patchwork.kernel.org/patch/9837437/

More info: there are two separate issues here.

1) The first issue is whether it's supported behavior to specify RSP as an output constraint in order to force GCC to create a stack frame. As far as I know, this is a quirk of GCC, and not really considered defined behavior.

 However, the idea was suggested by some GCC developers:

   https://gcc.gnu.org/ml/gcc/2015-07/msg00079.html

 So at least it seems to be endorsed by GCC to some degree.  If you
 need details on why it works, that thread has the details.

2) The second issue is whether clang should corrupt RSP. I don't see a reason for clang to do that. IMO, when using a local register variable as an input or output to inline asm, the compiler should leave the contents of the register alone.

 FWIW, my reading of the GCC manual seems to support that:

   https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
ramosian-glider commented 7 years ago

Do we have a short repro for this problem that doesn't require building the whole kernel?

m-gupta commented 7 years ago

Josh uploaded a new patch for this (https://lkml.org/lkml/2017/8/31/513). But there are some questions raised in particular by Linus Torvalds (https://lkml.org/lkml/2017/8/31/627):

On Thu, Aug 31, 2017 at 09:11:54AM -0700, Linus Torvalds wrote:

On the whole, I'm not entirely sure this is the right approach. I think we should

(a) approach clang about their obvious bug (a compiler that clobbers %rsp because we mark it as in/out is clearly buggy)

(b) ask gcc people if there's some other alternative that would work with clang as-is rather than the "mark %rsp register as clobbered"

I couldn't actually find the %rsp trick in any docs, I assume it came from discussions with gcc developers directly. Maybe there is something else we could do that doesn't upset clang?

Perhaps we can mark the frame pointer as an input, for example? Inputs also have the advantage that appending to the input list doesn't change the argument numbering, so we don't need to worry about numbered arguments (not that I mind the naming of arguments, but I kind of hate having to do it as part of this series).

Hmm?

          Linus
rengolin commented 7 years ago

After doing some testing, I don't think this approach is going to work after all. In addition to forcing the stack frame, it also causes GCC to add an unnecessary extra instruction to the epilogue of each affected function:

Right, that's not good either. :(

Josh is currently working on a more intrusive kernel patch that's likely to solve the problem: https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/ ?h=ASM_CALL

Looks hairy, but mostly mechanical.

ramosian-glider commented 7 years ago

My reading from that thread is that both clang and gcc treat the __sp variable different and each has its own benefits/problems. Since this is undefined and largely undocumented behaviour, I find it hard to believe either side will be convinced to change. Agreed.

However, there is one hint in that thread that may bring the final solution. Just add SP directly to the clobber list. It should work on both compilers and have the intended effect without additional movs.

Quoting https://lkml.org/lkml/2017/7/19/1144:

"""

IIRC, clobbering SP does at least force the stack frame on GCC, though I need to double check that. I can try to work up an official patch in the next week or so (need to do some testing first).

Sounds great.

Thanks again for looking into this and coming up with a solution!

After doing some testing, I don't think this approach is going to work after all. In addition to forcing the stack frame, it also causes GCC to add an unnecessary extra instruction to the epilogue of each affected function:

lea -0x10(%rbp),%rsp """ , so a patch that clobbers SP is unlikely to be accepted upstream (although it makes Clang build work :))

Josh is currently working on a more intrusive kernel patch that's likely to solve the problem: https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=ASM_CALL

rengolin commented 7 years ago

My reading from that thread is that both clang and gcc treat the __sp variable different and each has its own benefits/problems. Since this is undefined and largely undocumented behaviour, I find it hard to believe either side will be convinced to change.

However, there is one hint in that thread that may bring the final solution. Just add SP directly to the clobber list. It should work on both compilers and have the intended effect without additional movs.

ramosian-glider commented 7 years ago

Actually, the linked thread (https://lkml.org/lkml/2017/7/12/555) contains a deeper analysis by Josh Poimboeuf who also notes that simply making __sp a global variable leads to a kernel .text size regression under GCC.

ramosian-glider commented 7 years ago

(Sorry, accidentally sent a truncated message.)

According to what Renato wrote here: https://lists.linuxfoundation.org/pipermail/llvmlinux/2014-May/000946.html, GCC doesn't seem to always handle local register variables correctly either (I've just checked this is also true for x86_64), e.g. it may drop a store to such a variable.

The easiest way to fix the crashes is to move __sp to the global scope:

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index a969ae6..6adc0a7 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -174,11 +174,13 @@ typeof(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))

ramosian-glider commented 7 years ago

According to what

efriedma-quic commented 7 years ago

Consider the following:

void a() { register int a asm("sp") = 0; asm volatile("nop":"+r"(a)); }

In this case, both gcc and clang zero out "sp".

If you don't initialize the variable, you're basically asking the compiler to put uninitialized data into rsp. If you're lucky, the compiler realizes that putting uninitialized data into rsp is a no-op, and therefore does nothing... but if you're unlucky, the compiler shoves some other unrelated value into rsp, and it explodes (which is what is happening here).

I think the right approach here is to propose some well-defined mechanism for getting the result you want... and then maybe add a hack to clang to map this particular construct to the same mechanism.