llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.59k stars 11.81k forks source link

Signed integer overflow causes program to skip the epilogue and fall into another function #48943

Open m13253 opened 3 years ago

m13253 commented 3 years ago
Bugzilla Link 49599
Version trunk
OS All
CC @chandlerc,@DMG862,@dwblaikie,@DougGregor,@emaste,@LebedevRI,@pogo59,@zygoloid,@oToToT

Extended Description

Comment:

Clang 13 simply does not generate any code for f1 after the undefined behavior point. So any call onto f1 will eventually ends up fell into f2.

Although the compiler can do anything with an undefined behavior, including simply crashing, infinite loop, playing some music, or nuke the earth without violating the C++ specification. I still hope this undefined behavior won't be that surprising.

This issue is not observed in C frontend, or Clang 12.

Godbolt link for your convenience: https://godbolt.org/z/r3nWrE

Source code:

#include <stdio.h>

void f1(void) {
    for(int i = 0; i >= 0; i++) {
        // Undefined behavior
    }
}

void f2(void) {
    puts("Formatting /dev/sda1...");
    // system("mkfs -t btrfs -f /dev/sda1");
}

// Prevents inlining
void (*volatile p1)(void) = f1;
void (*volatile p2)(void) = f2;

int main(void) {
    puts(__VERSION__);
    p1();
    return 0;
}

Output:

Clang 13.0.0 (https://github.com/llvm/llvm-project.git fcdf7f6224610a51dc2ff47f2f1e3377329b64a7)
Formatting /dev/sda1...
chandlerc commented 3 years ago

Note that LLD already pads functions with int3 specifically to block falling through into functions (or attackers jumping into the padding and falling through):

https://godbolt.org/z/Y46YKfqTc

Gold uses NOPs, which I think should be fixed.

BFD doesn't pad the functions at all.

LLVM also doesn't pad the functions when you remove -ffunction-sections.

I'd suggest LLVM at least do something similar to LLD and insist on at least one byte of padding between functions and emitting that as int3 if not using -ffunction-sections. Or maybe -ffunction-sections should just be more of the default.

dwblaikie commented 3 years ago

Two opinions of mine:

(1) Adding ud2 is not that expensive.

According to Rust: https://github.com/rust-lang/rust/pull/45920

They deliberately add an ud2 for all unreachable places to prevent "falling through" into whatever code next in memory, and found the increase in code size "ranged from 0.0% to 0.1%."

Yes, that's roughly the plan I have in mind.

(2) Instead of adding ud2, can we emulate an infinite loop as Clang 12 does?

From the engineering aspect, I would consider reducing the affected radius of a UB as small as possible, to help debugging.

It's not generally that simple - deciding where/how to "recover" from UB would be pretty difficult.

The Clang-advised way to deal with this would be to compile with -fsanitize=undefined

https://godbolt.org/z/3aW69c

example.cpp:4:29: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior example.cpp:4:29 in

(not sure if there's a slight bug in the error message there (the "in" being at the end) or just a quirk of godbolt or the like)

Emulating an infinite loop even though the compiler knows it is a UB might help debugging -- much better than confusing the debugger with an unbalanced stack and a flying %rip register.

Yeah - the ud2's more likely/easier to explain which situations it applies to and which ones it doesn't, etc.

Again, since nobody answered yet, why this warning is not turned on by default as GCC does?

Because Clang doesn't have such a warning - in part because we try pretty hard not to have warnings that are powered by the optimizers (because doing so makes the warnings unreliable - they don't appear in some build modes, or might appear/disappear based on spooky-action-at-a-distance when other code changes and tips optimizations in one way or another).

Dynamic tools like the sanitizers provide more reliable & explanatory failures for this sort of situation.

m13253 commented 3 years ago

Two opinions of mine:

(1) Adding ud2 is not that expensive.

According to Rust: https://github.com/rust-lang/rust/pull/45920

They deliberately add an ud2 for all unreachable places to prevent "falling through" into whatever code next in memory, and found the increase in code size "ranged from 0.0% to 0.1%."

(2) Instead of adding ud2, can we emulate an infinite loop as Clang 12 does?

From the engineering aspect, I would consider reducing the affected radius of a UB as small as possible, to help debugging.

Emulating an infinite loop even though the compiler knows it is a UB might help debugging -- much better than confusing the debugger with an unbalanced stack and a flying %rip register.

Again, since nobody answered yet, why this warning is not turned on by default as GCC does?

pogo59 commented 3 years ago

All sorts of UB manifests in lots of security issues, right? Buffer overruns and the like (I guess this is a buffer overrun, of sorts).

I'm just thinking this particular one is handing someone a gadget. A buffer overrun is a user-code error, really, that is typically not something the compiler can detect (or not easily); this here is something the compiler is doing on purpose to the execution path.

The zero-length function is just a special case of this more general problem.

But probably not worth debating the fine points here.

dwblaikie commented 3 years ago

If I add a puts() to f1(), like so:

void f1(void) { puts("in f1\n"); for (int i = 0; i >= 0; i++) { // ub } }

then the codegen becomes very sad:

00000000004004d0 <_Z2f1v>: 4004d0: 50 push %rax 4004d1: bf 94 05 40 00 mov $0x400594,%edi 4004d6: e8 f5 fe ff ff callq 4003d0 puts@plt 4004db: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

00000000004004e0 <_Z2f2v>: 4004e0: 50 push %rax 4004e1: bf 9b 05 40 00 mov $0x40059b,%edi 4004e6: e8 e5 fe ff ff callq 4003d0 puts@plt 4004eb: 58 pop %rax 4004ec: c3 retq
4004ed: 0f 1f 00 nopl (%rax)

Note f1() does a push with no pop, then falls into f2(), which means the retq returns to.... whatever was in %rax.

Yeah, falling into another function on UB has been the way this works (except for the targets that opt out with TrapUnreachable) for a while now.

This means UB is a potential safety/security problem, and we really should do something about it.

All sorts of UB manifests in lots of security issues, right? Buffer overruns and the like (I guess this is a buffer overrun, of sorts).

So I'm not sure that's extra/new motivation.

My take on it is with regards to

pogo59 commented 3 years ago

If I add a puts() to f1(), like so:

void f1(void) { puts("in f1\n"); for (int i = 0; i >= 0; i++) { // ub } }

then the codegen becomes very sad:

00000000004004d0 <_Z2f1v>: 4004d0: 50 push %rax 4004d1: bf 94 05 40 00 mov $0x400594,%edi 4004d6: e8 f5 fe ff ff callq 4003d0 puts@plt 4004db: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

00000000004004e0 <_Z2f2v>: 4004e0: 50 push %rax 4004e1: bf 9b 05 40 00 mov $0x40059b,%edi 4004e6: e8 e5 fe ff ff callq 4003d0 puts@plt 4004eb: 58 pop %rax 4004ec: c3 retq
4004ed: 0f 1f 00 nopl (%rax)

Note f1() does a push with no pop, then falls into f2(), which means the retq returns to.... whatever was in %rax.

This means UB is a potential safety/security problem, and we really should do something about it.

pogo59 commented 3 years ago

First, when I try this case locally, f1() ends up being a retq instead of optimized away entirely.

Pilot error. Sorry for the noise. If I do it right, f1() is empty.

pogo59 commented 3 years ago

Ah, I was mis-remembering. It's a TM flag and we arranged to set it for PS4. Looks like each target (that cared) would have to be updated.

dwblaikie commented 3 years ago

Second, IIRC, "trap on unreachable" is an X86 target thing, and not an IR thing. For PS4 that's all we care about, but if it turns into a more generic issue then we'll need some other mechanism.

Oh, thanks for pointing that out - good to know/think about.

Since Apple uses it for MachO I was thinking "oh, what do they do on their other targets" - and it seems they maybe do enable it for ARM and AArch64 (& it's also enabled for WebAssembly). And maybe even the new old M68K backend...

$ grep -ir trap.*unreachable llvm/lib llvm/lib/CodeGen/SelectionDAG/FastISel.cpp: if (TM.Options.TrapUnreachable) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp: if (!DAG.getTarget().Options.TrapUnreachable) llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp: if (DAG.getTarget().Options.TrapUnreachable) llvm/lib/CodeGen/LLVMTargetMachine.cpp:static cl::opt EnableTrapUnreachable("trap-unreachable", llvm/lib/CodeGen/LLVMTargetMachine.cpp: cl::desc("Enable generating trap for unreachable")); llvm/lib/CodeGen/LLVMTargetMachine.cpp: if (EnableTrapUnreachable) llvm/lib/CodeGen/LLVMTargetMachine.cpp: this->Options.TrapUnreachable = true; llvm/lib/Target/AArch64/AArch64TargetMachine.cpp: this->Options.TrapUnreachable = true; llvm/lib/Target/AArch64/AArch64TargetMachine.cpp: this->Options.TrapUnreachable = true; llvm/lib/Target/ARM/ARMTargetMachine.cpp: this->Options.TrapUnreachable = true; llvm/lib/Target/X86/X86TargetMachine.cpp: // the calling function, and TrapUnreachable is an easy way to get that. llvm/lib/Target/X86/X86TargetMachine.cpp: // FIXME/Check here - 'trap unreachable' doesn't catch all cases of empty llvm/lib/Target/X86/X86TargetMachine.cpp: this->Options.TrapUnreachable = true; llvm/lib/Target/WebAssembly/WebAssemblyInstrControl.td:defm UNREACHABLE : NRI<(outs), (ins), [(trap)], "unreachable", 0x00>; llvm/lib/Target/WebAssembly/WebAssemblyInstrControl.td:// terminator, lowering debugtrap to UNREACHABLE can create an invalid llvm/lib/Target/WebAssembly/WebAssemblyInstrControl.td:defm DEBUG_UNREACHABLE : NRI<(outs), (ins), [(debugtrap)], "unreachable", 0x00>; llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp: // Trap lowers to wasm unreachable llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp: this->Options.TrapUnreachable = true; llvm/lib/Target/M68k/M68kISelLowering.cpp: if (CLI.DoesNotReturn && !getTargetMachine().Options.TrapUnreachable) { llvm/lib/Transforms/Utils/Local.cpp: // Don't insert a call to llvm.trap right before the unreachable. llvm/lib/Transforms/Utils/Local.cpp: // Don't insert a call to llvm.trap right before the unreachable.

dwblaikie commented 3 years ago

Second, IIRC, "trap on unreachable" is an X86 target thing, and not an IR thing. For PS4 that's all we care about, but if it turns into a more generic issue then we'll need some other mechanism.

Oh, thanks for pointing that out - good to know/think about.

dwblaikie commented 3 years ago

My hope was to not go as far as trap-on-unreachable, which can be pretty clearly pessimizing (eg: "void f1() { unreachable; } void f2() { if (x) { f1(); } }" will generate code, test the condition, etc, when the whole condition/block should be optimized away instead)

Hmm, nope, that was wrong - seems trap-on-unreachable doesn't pessimize the middle end (unreachable blocks can still be removed by SimplifyCFG, etc) - but is only tested in codegen/targets.

Perhaps the MachO behavior (trap on unreachable, but don't trap after a noreturn call) might be worth generalizing. I'll see about measuring the impact of that to see what it'd be like to use it by default. (note that the MachO behavior still leaves the possibility of falling off the end if you violate the noreturn contract and return from such a function)

pogo59 commented 3 years ago

Two things:

First, when I try this case locally, f1() ends up being a retq instead of optimized away entirely. Using clang trunk from last night on linux, and the same options (-O1) as the godbolt link. That would be recognizing the loop as empty, no uses of the iteration var, so the entire thing can be optimized away. No UB detected, apparently. Not sure what the difference is.

Second, IIRC, "trap on unreachable" is an X86 target thing, and not an IR thing. For PS4 that's all we care about, but if it turns into a more generic issue then we'll need some other mechanism.

dwblaikie commented 3 years ago

-mllvm -trap-unreachable will add a trap.

Some targets (MachO, Webassembly, PS4) will do this by default.

Hm, that doesn't pessimize the IR at all: https://godbolt.org/z/q49rq6 Only the assembly: https://godbolt.org/z/sceK9T Why isn't this the default then?

As you say, it pessimizes the assembly - how much pessimizing is worth the extra safety is unclear. I think there's some room for a bit more safety & I'm working on that - but my guess is we might still leave some cases where you could fall through.

dwblaikie commented 3 years ago

According to Twitter user @​ScottWolchok, this behavior is not conforming to the standard.

(§5.10/1): "Two pointers of the same type compare equal if and only if they are both null, both point to the same function, or both represent the same address"

However, both f1 and f2 have the same address, yet they are different functions.

Yes, that particular aspect of this is one of the reasons I was interested in the issue when I came across it due to a debug info problem: https://lists.llvm.org/pipermail/llvm-dev/2020-July/thread.html#143722

Though technically we can fix the conformance issue while still having this kind of UB in some fairly accessible cases - LLVM has an attribute "addrsig" to denote which functions have significant addresses - so, constructors for instance (which can't have their address taken/compared) could still have the "run into another function" failure mode.

(& I expect we still have this failure mode in non-empty functions too, FWIW)

My hope was to not go as far as trap-on-unreachable, which can be pretty clearly pessimizing (eg: "void f1() { unreachable; } void f2() { if (x) { f1(); } }" will generate code, test the condition, etc, when the whole condition/block should be optimized away instead) to just the narrow case where we could add one extra byte of instructions and stop a function from falling off the end into another function where there's a good chance of that. (exactly what a "good chance" is is debatable - if it's after a call, do we do it because the call might return? what about if the call is never going to return (it might be attributed with noreturn, or it might just not return for this call site/these particular parameter values))

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 3 years ago

Yes, irrespective of anything else, giving f1 and f2 the same address is a miscompile. I think that's a somewhat different issue than the one originally filed, as that only covers the case where f1 ends up being empty.

m13253 commented 3 years ago

According to Twitter user @​ScottWolchok, this behavior is not conforming to the standard.

(§5.10/1): "Two pointers of the same type compare equal if and only if they are both null, both point to the same function, or both represent the same address"

However, both f1 and f2 have the same address, yet they are different functions.

pogo59 commented 3 years ago

Why isn't this the default then?

It will increase code size a bit. If you have stack-protector turned on, there is likely to be a call to the (noreturn) check-fail function at the end of any function with a nontrivial amount of local storage, and any call to a noreturn function would need a trap after it. For PS4, we encountered it because the "return" address of the check-fail call could actually be the entry point of the next function, and the traceback/unwinder code was not subtracting 1 from the return address before symbolizing. It was easier to make the compiler insert the trap than to fix the unwinder.

LebedevRI commented 3 years ago

-mllvm -trap-unreachable will add a trap.

Some targets (MachO, Webassembly, PS4) will do this by default.

Hm, that doesn't pessimize the IR at all: https://godbolt.org/z/q49rq6 Only the assembly: https://godbolt.org/z/sceK9T Why isn't this the default then?

4c958a03-4ad7-407c-a85b-320d9a7221b0 commented 3 years ago

-mllvm -trap-unreachable will add a trap.

Some targets (MachO, Webassembly, PS4) will do this by default.

m13253 commented 3 years ago

FWIW, I have plans to at least put a ud2/trap at the end of functions that don't otherwise return or at least call (call might be non-returning, so wouldn't want to add an extra instruction if it won't ever be called - but does mean sometimes it could fallthrough to the following function).

But yeah, UB is UB, and this is within the realm of failures you can expect from LLVM's optimizations.

I would consider ud2 a reasonable solution. It's free of performance cost, and it saves lives.

I understand the compiler can do anything with a UB. But triggering a trap is always better than nuking the earth, isn't it?

By the way, GCC warns this UB by default (without -Wall or -Wextra) as long you don't give it -O0. Is there any technical / performance reason to put this check into -fsanitize=undefined instead of making it default?

Anyhow, a ud2 is good enough.

dwblaikie commented 3 years ago

FWIW, I have plans to at least put a ud2/trap at the end of functions that don't otherwise return or at least call (call might be non-returning, so wouldn't want to add an extra instruction if it won't ever be called - but does mean sometimes it could fallthrough to the following function).

But yeah, UB is UB, and this is within the realm of failures you can expect from LLVM's optimizations.

LebedevRI commented 3 years ago

This is indeed UB, and -fsanitize=undefined will tell you about it: https://godbolt.org/z/79q6s8 example.cpp:4:29: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior example.cpp:4:29 in

You probably want -fwrapv.

inclyc commented 1 year ago

Recently we have received a lot of cases where infinite loop/unreachable code is deleted and fallthrough to the next function. Our consensus is the behavior here very strange and surprising, see comment from @AaronBallman https://github.com/llvm/llvm-project/issues/60637#issuecomment-1424557264. However this may be legal thanks to UB.

There was a proposal to insert a "ud2" here: https://github.com/llvm/llvm-project/issues/48943#issuecomment-981039958. And @dwblaikie mentioned his minds here: https://github.com/llvm/llvm-project/issues/60588#issuecomment-1421817714.

@shafik mentioned there is a paper intended to address this problem https://github.com/llvm/llvm-project/issues/60622#issuecomment-1424500145.

FrankHB commented 1 year ago

I'm neutral to optimize such cases if indeed UB (anyway you have -mllvm -trap-unreachable to force the detailed code generation behavior less surprising) so I can accept a WONTFIX in this case. But mixing UB and other cases in the same path is intolerable, so I don't treat #60622 the same. For the latter we'd wait the paper. Note ubsan also has the same problem specific to this point.

inclyc commented 1 year ago

Thanks @pbo-linaro! The commit introduced loop deletion was: https://github.com/llvm/llvm-project/commit/6c3129549374c0e81e28fd0a21e96f8087b63a78, mentioned in #60652

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-codegen

neldredge commented 1 year ago

Just to reiterate a point raised in some of the linked bugs, one problem is that this can result in the UB function having the same address as another function, so that their function pointers compare equal, which is not allowed by the C/C++ standards. Such a program does not have UB as long as the offending function is never called, so the breakage of this bug isn't limited to UB programs.

Code like that might even exist. If you are processing function pointers and you need multiple sentinel values (so that NULL isn't sufficient), you could define a dummy function and use its address as a sentinel. And since the function will never be called, you could put any old garbage code inside it.

shafik commented 10 months ago

Also see: https://discourse.llvm.org/t/can-we-keep-must-progress-optimizations-around-infinite-loop-in-c-while-avoiding-some-surprising-behavior/69205/5