Open m13253 opened 3 years ago
Note that LLD already pads functions with int3
specifically to block falling through into functions (or attackers jumping into the padding and falling through):
https://godbolt.org/z/Y46YKfqTc
Gold uses NOPs, which I think should be fixed.
BFD doesn't pad the functions at all.
LLVM also doesn't pad the functions when you remove -ffunction-sections
.
I'd suggest LLVM at least do something similar to LLD and insist on at least one byte of padding between functions and emitting that as int3
if not using -ffunction-sections
. Or maybe -ffunction-sections
should just be more of the default.
Two opinions of mine:
(1) Adding ud2 is not that expensive.
According to Rust: https://github.com/rust-lang/rust/pull/45920
They deliberately add an ud2 for all unreachable places to prevent "falling through" into whatever code next in memory, and found the increase in code size "ranged from 0.0% to 0.1%."
Yes, that's roughly the plan I have in mind.
(2) Instead of adding ud2, can we emulate an infinite loop as Clang 12 does?
From the engineering aspect, I would consider reducing the affected radius of a UB as small as possible, to help debugging.
It's not generally that simple - deciding where/how to "recover" from UB would be pretty difficult.
The Clang-advised way to deal with this would be to compile with -fsanitize=undefined
example.cpp:4:29: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior example.cpp:4:29 in
(not sure if there's a slight bug in the error message there (the "in" being at the end) or just a quirk of godbolt or the like)
Emulating an infinite loop even though the compiler knows it is a UB might help debugging -- much better than confusing the debugger with an unbalanced stack and a flying %rip register.
Yeah - the ud2's more likely/easier to explain which situations it applies to and which ones it doesn't, etc.
Again, since nobody answered yet, why this warning is not turned on by default as GCC does?
Because Clang doesn't have such a warning - in part because we try pretty hard not to have warnings that are powered by the optimizers (because doing so makes the warnings unreliable - they don't appear in some build modes, or might appear/disappear based on spooky-action-at-a-distance when other code changes and tips optimizations in one way or another).
Dynamic tools like the sanitizers provide more reliable & explanatory failures for this sort of situation.
Two opinions of mine:
(1) Adding ud2 is not that expensive.
According to Rust: https://github.com/rust-lang/rust/pull/45920
They deliberately add an ud2 for all unreachable places to prevent "falling through" into whatever code next in memory, and found the increase in code size "ranged from 0.0% to 0.1%."
(2) Instead of adding ud2, can we emulate an infinite loop as Clang 12 does?
From the engineering aspect, I would consider reducing the affected radius of a UB as small as possible, to help debugging.
Emulating an infinite loop even though the compiler knows it is a UB might help debugging -- much better than confusing the debugger with an unbalanced stack and a flying %rip register.
Again, since nobody answered yet, why this warning is not turned on by default as GCC does?
All sorts of UB manifests in lots of security issues, right? Buffer overruns and the like (I guess this is a buffer overrun, of sorts).
I'm just thinking this particular one is handing someone a gadget. A buffer overrun is a user-code error, really, that is typically not something the compiler can detect (or not easily); this here is something the compiler is doing on purpose to the execution path.
The zero-length function is just a special case of this more general problem.
But probably not worth debating the fine points here.
If I add a puts() to f1(), like so:
void f1(void) { puts("in f1\n"); for (int i = 0; i >= 0; i++) { // ub } }
then the codegen becomes very sad:
00000000004004d0 <_Z2f1v>: 4004d0: 50 push %rax 4004d1: bf 94 05 40 00 mov $0x400594,%edi 4004d6: e8 f5 fe ff ff callq 4003d0 puts@plt 4004db: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
00000000004004e0 <_Z2f2v>: 4004e0: 50 push %rax 4004e1: bf 9b 05 40 00 mov $0x40059b,%edi 4004e6: e8 e5 fe ff ff callq 4003d0 puts@plt 4004eb: 58 pop %rax 4004ec: c3 retq
4004ed: 0f 1f 00 nopl (%rax)Note f1() does a push with no pop, then falls into f2(), which means the retq returns to.... whatever was in %rax.
Yeah, falling into another function on UB has been the way this works (except for the targets that opt out with TrapUnreachable) for a while now.
This means UB is a potential safety/security problem, and we really should do something about it.
All sorts of UB manifests in lots of security issues, right? Buffer overruns and the like (I guess this is a buffer overrun, of sorts).
So I'm not sure that's extra/new motivation.
My take on it is with regards to
If I add a puts() to f1(), like so:
void f1(void) { puts("in f1\n"); for (int i = 0; i >= 0; i++) { // ub } }
then the codegen becomes very sad:
00000000004004d0 <_Z2f1v>: 4004d0: 50 push %rax 4004d1: bf 94 05 40 00 mov $0x400594,%edi 4004d6: e8 f5 fe ff ff callq 4003d0 puts@plt 4004db: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
00000000004004e0 <_Z2f2v>:
4004e0: 50 push %rax
4004e1: bf 9b 05 40 00 mov $0x40059b,%edi
4004e6: e8 e5 fe ff ff callq 4003d0 puts@plt
4004eb: 58 pop %rax
4004ec: c3 retq
4004ed: 0f 1f 00 nopl (%rax)
Note f1() does a push with no pop, then falls into f2(), which means the retq returns to.... whatever was in %rax.
This means UB is a potential safety/security problem, and we really should do something about it.
First, when I try this case locally, f1() ends up being a retq instead of optimized away entirely.
Pilot error. Sorry for the noise. If I do it right, f1() is empty.
Ah, I was mis-remembering. It's a TM flag and we arranged to set it for PS4. Looks like each target (that cared) would have to be updated.
Second, IIRC, "trap on unreachable" is an X86 target thing, and not an IR thing. For PS4 that's all we care about, but if it turns into a more generic issue then we'll need some other mechanism.
Oh, thanks for pointing that out - good to know/think about.
Since Apple uses it for MachO I was thinking "oh, what do they do on their other targets" - and it seems they maybe do enable it for ARM and AArch64 (& it's also enabled for WebAssembly). And maybe even the new old M68K backend...
$ grep -ir trap.*unreachable llvm/lib
llvm/lib/CodeGen/SelectionDAG/FastISel.cpp: if (TM.Options.TrapUnreachable)
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp: if (!DAG.getTarget().Options.TrapUnreachable)
llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp: if (DAG.getTarget().Options.TrapUnreachable)
llvm/lib/CodeGen/LLVMTargetMachine.cpp:static cl::opt
Second, IIRC, "trap on unreachable" is an X86 target thing, and not an IR thing. For PS4 that's all we care about, but if it turns into a more generic issue then we'll need some other mechanism.
Oh, thanks for pointing that out - good to know/think about.
My hope was to not go as far as trap-on-unreachable, which can be pretty clearly pessimizing (eg: "void f1() { unreachable; } void f2() { if (x) { f1(); } }" will generate code, test the condition, etc, when the whole condition/block should be optimized away instead)
Hmm, nope, that was wrong - seems trap-on-unreachable doesn't pessimize the middle end (unreachable blocks can still be removed by SimplifyCFG, etc) - but is only tested in codegen/targets.
Perhaps the MachO behavior (trap on unreachable, but don't trap after a noreturn call) might be worth generalizing. I'll see about measuring the impact of that to see what it'd be like to use it by default. (note that the MachO behavior still leaves the possibility of falling off the end if you violate the noreturn contract and return from such a function)
Two things:
First, when I try this case locally, f1() ends up being a retq instead of optimized away entirely. Using clang trunk from last night on linux, and the same options (-O1) as the godbolt link. That would be recognizing the loop as empty, no uses of the iteration var, so the entire thing can be optimized away. No UB detected, apparently. Not sure what the difference is.
Second, IIRC, "trap on unreachable" is an X86 target thing, and not an IR thing. For PS4 that's all we care about, but if it turns into a more generic issue then we'll need some other mechanism.
-mllvm -trap-unreachable will add a trap.
Some targets (MachO, Webassembly, PS4) will do this by default.
Hm, that doesn't pessimize the IR at all: https://godbolt.org/z/q49rq6 Only the assembly: https://godbolt.org/z/sceK9T Why isn't this the default then?
As you say, it pessimizes the assembly - how much pessimizing is worth the extra safety is unclear. I think there's some room for a bit more safety & I'm working on that - but my guess is we might still leave some cases where you could fall through.
According to Twitter user @ScottWolchok, this behavior is not conforming to the standard.
(§5.10/1): "Two pointers of the same type compare equal if and only if they are both null, both point to the same function, or both represent the same address"
However, both f1 and f2 have the same address, yet they are different functions.
Yes, that particular aspect of this is one of the reasons I was interested in the issue when I came across it due to a debug info problem: https://lists.llvm.org/pipermail/llvm-dev/2020-July/thread.html#143722
Though technically we can fix the conformance issue while still having this kind of UB in some fairly accessible cases - LLVM has an attribute "addrsig" to denote which functions have significant addresses - so, constructors for instance (which can't have their address taken/compared) could still have the "run into another function" failure mode.
(& I expect we still have this failure mode in non-empty functions too, FWIW)
My hope was to not go as far as trap-on-unreachable, which can be pretty clearly pessimizing (eg: "void f1() { unreachable; } void f2() { if (x) { f1(); } }" will generate code, test the condition, etc, when the whole condition/block should be optimized away instead) to just the narrow case where we could add one extra byte of instructions and stop a function from falling off the end into another function where there's a good chance of that. (exactly what a "good chance" is is debatable - if it's after a call, do we do it because the call might return? what about if the call is never going to return (it might be attributed with noreturn, or it might just not return for this call site/these particular parameter values))
Yes, irrespective of anything else, giving f1 and f2 the same address is a miscompile. I think that's a somewhat different issue than the one originally filed, as that only covers the case where f1 ends up being empty.
According to Twitter user @ScottWolchok, this behavior is not conforming to the standard.
(§5.10/1): "Two pointers of the same type compare equal if and only if they are both null, both point to the same function, or both represent the same address"
However, both f1 and f2 have the same address, yet they are different functions.
Why isn't this the default then?
It will increase code size a bit. If you have stack-protector turned on, there is likely to be a call to the (noreturn) check-fail function at the end of any function with a nontrivial amount of local storage, and any call to a noreturn function would need a trap after it. For PS4, we encountered it because the "return" address of the check-fail call could actually be the entry point of the next function, and the traceback/unwinder code was not subtracting 1 from the return address before symbolizing. It was easier to make the compiler insert the trap than to fix the unwinder.
-mllvm -trap-unreachable will add a trap.
Some targets (MachO, Webassembly, PS4) will do this by default.
Hm, that doesn't pessimize the IR at all: https://godbolt.org/z/q49rq6 Only the assembly: https://godbolt.org/z/sceK9T Why isn't this the default then?
-mllvm -trap-unreachable will add a trap.
Some targets (MachO, Webassembly, PS4) will do this by default.
FWIW, I have plans to at least put a ud2/trap at the end of functions that don't otherwise return or at least call (call might be non-returning, so wouldn't want to add an extra instruction if it won't ever be called - but does mean sometimes it could fallthrough to the following function).
But yeah, UB is UB, and this is within the realm of failures you can expect from LLVM's optimizations.
I would consider ud2 a reasonable solution. It's free of performance cost, and it saves lives.
I understand the compiler can do anything with a UB. But triggering a trap is always better than nuking the earth, isn't it?
By the way, GCC warns this UB by default (without -Wall or -Wextra) as long you don't give it -O0. Is there any technical / performance reason to put this check into -fsanitize=undefined instead of making it default?
Anyhow, a ud2 is good enough.
FWIW, I have plans to at least put a ud2/trap at the end of functions that don't otherwise return or at least call (call might be non-returning, so wouldn't want to add an extra instruction if it won't ever be called - but does mean sometimes it could fallthrough to the following function).
But yeah, UB is UB, and this is within the realm of failures you can expect from LLVM's optimizations.
This is indeed UB, and -fsanitize=undefined will tell you about it: https://godbolt.org/z/79q6s8 example.cpp:4:29: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior example.cpp:4:29 in
You probably want -fwrapv.
Recently we have received a lot of cases where infinite loop/unreachable code is deleted and fallthrough to the next function. Our consensus is the behavior here very strange and surprising, see comment from @AaronBallman https://github.com/llvm/llvm-project/issues/60637#issuecomment-1424557264. However this may be legal thanks to UB.
There was a proposal to insert a "ud2" here: https://github.com/llvm/llvm-project/issues/48943#issuecomment-981039958. And @dwblaikie mentioned his minds here: https://github.com/llvm/llvm-project/issues/60588#issuecomment-1421817714.
@shafik mentioned there is a paper intended to address this problem https://github.com/llvm/llvm-project/issues/60622#issuecomment-1424500145.
I'm neutral to optimize such cases if indeed UB (anyway you have -mllvm -trap-unreachable
to force the detailed code generation behavior less surprising) so I can accept a WONTFIX in this case. But mixing UB and other cases in the same path is intolerable, so I don't treat #60622 the same. For the latter we'd wait the paper. Note ubsan also has the same problem specific to this point.
Thanks @pbo-linaro! The commit introduced loop deletion was: https://github.com/llvm/llvm-project/commit/6c3129549374c0e81e28fd0a21e96f8087b63a78, mentioned in #60652
@llvm/issue-subscribers-clang-codegen
Just to reiterate a point raised in some of the linked bugs, one problem is that this can result in the UB function having the same address as another function, so that their function pointers compare equal, which is not allowed by the C/C++ standards. Such a program does not have UB as long as the offending function is never called, so the breakage of this bug isn't limited to UB programs.
Code like that might even exist. If you are processing function pointers and you need multiple sentinel values (so that NULL isn't sufficient), you could define a dummy function and use its address as a sentinel. And since the function will never be called, you could put any old garbage code inside it.
Extended Description
Comment:
Clang 13 simply does not generate any code for f1 after the undefined behavior point. So any call onto f1 will eventually ends up fell into f2.
Although the compiler can do anything with an undefined behavior, including simply crashing, infinite loop, playing some music, or nuke the earth without violating the C++ specification. I still hope this undefined behavior won't be that surprising.
This issue is not observed in C frontend, or Clang 12.
Godbolt link for your convenience: https://godbolt.org/z/r3nWrE
Source code:
Output: