Open m-gupta opened 7 years ago
Do we have a short repro for this problem that doesn't require building the whole kernel?
Josh uploaded a new patch for this (https://lkml.org/lkml/2017/8/31/513). But there are some questions raised in particular by Linus Torvalds (https://lkml.org/lkml/2017/8/31/627):
On Thu, Aug 31, 2017 at 09:11:54AM -0700, Linus Torvalds wrote:
On the whole, I'm not entirely sure this is the right approach. I think we should
(a) approach clang about their obvious bug (a compiler that clobbers %rsp because we mark it as in/out is clearly buggy)
(b) ask gcc people if there's some other alternative that would work with clang as-is rather than the "mark %rsp register as clobbered"
I couldn't actually find the %rsp trick in any docs, I assume it came from discussions with gcc developers directly. Maybe there is something else we could do that doesn't upset clang?
Perhaps we can mark the frame pointer as an input, for example? Inputs also have the advantage that appending to the input list doesn't change the argument numbering, so we don't need to worry about numbered arguments (not that I mind the naming of arguments, but I kind of hate having to do it as part of this series).
Hmm?
Linus
After doing some testing, I don't think this approach is going to work after all. In addition to forcing the stack frame, it also causes GCC to add an unnecessary extra instruction to the epilogue of each affected function:
Right, that's not good either. :(
Josh is currently working on a more intrusive kernel patch that's likely to solve the problem: https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/ ?h=ASM_CALL
Looks hairy, but mostly mechanical.
My reading from that thread is that both clang and gcc treat the __sp variable different and each has its own benefits/problems. Since this is undefined and largely undocumented behaviour, I find it hard to believe either side will be convinced to change. Agreed.
However, there is one hint in that thread that may bring the final solution. Just add SP directly to the clobber list. It should work on both compilers and have the intended effect without additional movs.
Quoting https://lkml.org/lkml/2017/7/19/1144:
"""
IIRC, clobbering SP does at least force the stack frame on GCC, though I need to double check that. I can try to work up an official patch in the next week or so (need to do some testing first).
Sounds great.
Thanks again for looking into this and coming up with a solution!
After doing some testing, I don't think this approach is going to work after all. In addition to forcing the stack frame, it also causes GCC to add an unnecessary extra instruction to the epilogue of each affected function:
lea -0x10(%rbp),%rsp """ , so a patch that clobbers SP is unlikely to be accepted upstream (although it makes Clang build work :))
Josh is currently working on a more intrusive kernel patch that's likely to solve the problem: https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=ASM_CALL
My reading from that thread is that both clang and gcc treat the __sp variable different and each has its own benefits/problems. Since this is undefined and largely undocumented behaviour, I find it hard to believe either side will be convinced to change.
However, there is one hint in that thread that may bring the final solution. Just add SP directly to the clobber list. It should work on both compilers and have the intended effect without additional movs.
Actually, the linked thread (https://lkml.org/lkml/2017/7/12/555) contains a deeper analysis by Josh Poimboeuf who also notes that simply making __sp a global variable leads to a kernel .text size regression under GCC.
(Sorry, accidentally sent a truncated message.)
According to what Renato wrote here: https://lists.linuxfoundation.org/pipermail/llvmlinux/2014-May/000946.html, GCC doesn't seem to always handle local register variables correctly either (I've just checked this is also true for x86_64), e.g. it may drop a store to such a variable.
The easiest way to fix the crashes is to move __sp to the global scope:
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index a969ae6..6adc0a7 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -174,11 +174,13 @@ typeof(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
({ \ int __ret_gu; \ register inttype(*(ptr)) val_gu asm("%"_ASM_DX); \
According to what
Consider the following:
void a() { register int a asm("sp") = 0; asm volatile("nop":"+r"(a)); }
In this case, both gcc and clang zero out "sp".
If you don't initialize the variable, you're basically asking the compiler to put uninitialized data into rsp. If you're lucky, the compiler realizes that putting uninitialized data into rsp is a no-op, and therefore does nothing... but if you're unlucky, the compiler shoves some other unrelated value into rsp, and it explodes (which is what is happening here).
I think the right approach here is to propose some well-defined mechanism for getting the result you want... and then maybe add a hack to clang to map this particular construct to the same mechanism.
Extended Description
Reported by mka@chromium.org. chromium bug: https://bugs.chromium.org/p/chromium/issues/detail?id=737659
The following upstream kernel commit intends to forces a stack frame before inline assembly code if it doesn't already exist:
commit f05058c4d652b619adfda6c78d8f5b341169c264 Author: Chris J Arges chris.j.arges@canonical.com Date: Thu Jan 21 16:49:25 2016 -0600
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index a4a30e4b2d34..9bbb3b2d0372 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -179,10 +179,11 @@ typeof(builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL)) ({ \ int ret_gu; \ register inttype(*(ptr)) val_gu asm("%"_ASM_DX); \
This inline asm causes double fault when compiled with clang.
Analysis by Josh Poimboeuf:
Here's the reason for the double fault. First it puts zero on the stack at offset -0x58:
Then, later, it copies that zeroed word from the stack to RSP:
Then it double faults because the call instruction tries to write RIP on the stack, but RSP is zero:
Then clang tries to put RSP's value on the stack, at the same stack slot where the original zero was stored (though it never reaches this point):
The panic is consistent with the above. RIP points to the call instruction, RSP is zero:
clang is obviously getting confused by the RSP output constraint. I think it tries to take the constraint literally, since it takes RSP as an output from the inline asm and stores it on the stack. However, that behavior doesn't really make sense for a "register" variable. It also doesn't explain why it's zeroing the register out first.
Link with the discussion: https://patchwork.kernel.org/patch/9837437/
More info: there are two separate issues here.
1) The first issue is whether it's supported behavior to specify RSP as an output constraint in order to force GCC to create a stack frame. As far as I know, this is a quirk of GCC, and not really considered defined behavior.
2) The second issue is whether clang should corrupt RSP. I don't see a reason for clang to do that. IMO, when using a local register variable as an input or output to inline asm, the compiler should leave the contents of the register alone.