Unreachable code paths can fall through into unrelated functions

SpexGuy commented 3 years ago

Undefined behavior is a powerful and important optimization tool which zig embraces. We take this so far as to have the standard library assert failure indicate that failure is unreachable, and a program which triggers failure is invalid. The language builds UB into everyday use, and the knowledge can be valuable for promoting optimization. However, for this to be a reasonable decision, the potential downsides have to be bounded. Many proponents of UB often say things like "the idea that UB can delete your hard drive is ridiculous, UB can only do things within the realm of the original code". Zig needs to ensure that this remains true, and work to decrease the likelyhood that optimization from UB has consequences that outweigh its usefulness.

So let's look at an example where this currently goes awry:

const assert = @import("std").debug.assert;

// ----------------------------------------

export fn adoptDoggo() callconv(.C) void {
    var doggo_type: u32 = getDoggoTypeToAdopt();
    assert(doggo_type == 0);
    reallyAdoptDoggo(doggo_type);
}

fn getDoggoTypeToAdopt() u32 {
    return 1;
}

extern fn reallyAdoptDoggo(doggo_type: u32) callconv(.C) void;

// -----------------------------------------

export fn launchNukes() callconv(.C) void {
    reallyLaunchNukes();
}

extern fn reallyLaunchNukes() callconv(.C) void;

Here's a library written in Zig, for use with C. It exposes two API entry points: adoptDoggo(), which performs some parameter substitution and validation and calls into an implementation, and launchNukes() which is a similar wrapper around an implementation. This code has a bug, which in most languages would be pretty benign. getDoggoTypeToAdopt() returns 1, but adoptDoggo() asserts that this value is zero. Because Zig's assert triggers unreachable, this code path is marked as unreachable and optimized away. The code this generates under release-fast optimization looks like this:

adoptDoggo:
launchNukes:
    jmp     reallyLaunchNukes@PLT

So calling adoptDoggo now tail calls reallyLaunchNukes, which seems like a problem. The code I would have expected this to generate is:

adoptDoggo:
    ret

launchNukes:
    jmp     reallyLaunchNukes@PLT

The fact that unreachable can be directed to any other basic block is an important part of its optimizability. However, I don't think it's useful for that property to extend beyond the current function. In this case, this optimization saves one byte of code and has potentially catastrophic consequences. This optimization can never save more than exactly one byte of code, because all unreachable entry points could share the same ret.

I'm not sure whether the fault here lies in Zig or LLVM. The LLVM optimizer is behaving correctly based on the information emitted by the zig compiler, this is technically a valid transformation under its rules. However you could argue that those rules should maybe be changed because of this. I can't think of any situation where allowing entry points to fall through to other entry points would ever be a useful optimization, but I also don't have a proof that there are no such cases. In any case, I feel that Zig needs a solution to this problem, whether or not it's important to LLVM.

LemonBoy commented 3 years ago

The problem is that Zig's unreachable doesn't translate well to LLVM's unreachable, the latter is defined as:

The ‘unreachable’ instruction has no defined semantics. This instruction is used to inform the optimizer that a particular portion of the code is not reachable. This can be used to indicate that the code after a no-return function cannot be reached, and other facts.

The optimizer is taking the whole function body away but cannot strip the adoptDoggo symbol away as it's exported, the code it generates is perfectly legal but not what we want. Zig's unreachable means "this part of code is not meant to be reachable, if we get here there's something really wrong", that's closer to llvm.trap or a tail-call to any other noreturn function: this serves the original purpose of killing the control flow for the optimizer and ensures that nothing bad happens if that code path is ever executed.

On the other hand it's assert that's ill-defined for high-optimization levels. While other languages ignore the expression altogether (leading to subtle bugs if it has side-effects) when optimizing, other enforce them no matter the optimization level (eg. Rust). We take a third approach where assertions are pretty much useless in release mode, we're telling the backend that the expression is always true and if this condition doesn't hold, as in this example, miscompilations are bound to happen.

marler8997 commented 3 years ago

How about using @panic instead of unreachable?

agathazeren commented 3 years ago

On the other hand it's assert that's ill-defined for high-optimization levels. While other languages ignore the expression altogether (leading to subtle bugs if it has side-effects) when optimizing, other enforce them no matter the optimization level (eg. Rust). We take a third approach where assertions are pretty much useless in release mode, we're telling the backend that the expression is always true and if this condition doesn't hold, as in this example, miscompilations are bound to happen.

My take on this issue is that the problem is the naming of the assert function. As assert is the most common name for this function, people will naturally reach for it when the want to check a condition. However, the behavior that most people want for assertions is to panic in all release modes. (Even if this is not what people expect, it should probably be the default, as it is likely to cause the fewest and least severe issues.) The disconnect between the default name and the default behavior can lead to a number of issues. I would propose that assert always panics, something like guarantee uses unreachable, and something like debugAssert is a no-op in unchecked and panic in safe modes.

Also, I'm pretty sure that preventing unreachables from falling through to another function reduces to the halting problem, like this:

fn hard(arg: usize) void {
    assert(arg == 1);

    var tm = TuringMachine.init(program);
    while(tm.canContinue()) tm.step();
}

If the turing machine completes, then the unreachable will fallthrough, but it will not if the turing machine does not complete.

presentfactory commented 11 months ago

"Zig needs to ensure that this remains true, and work to decrease the likelyhood that optimization from UB has consequences that outweigh its usefulness."

I do not see why this needs to remain true, no one expects this to be the case so this whole case is built just on a flawed assumption. Undefined Behavior can and should be capable of literally anything, that is how it has always been defined in various languages, there are no bounds to the devastation it may cause. Adding a ret like this adds bloat to the executable which the user did not request. Yes it is safer, but Zig is not a safe language, and a proper safe language wouldn't even allow UB like this to begin with.

Returning is also not as safe as you might imagine either. In many cases it may be preferable to simply continuing execution, but in a function which expects data to be written to pointers or something returning early will still cause undefined behavior as things start working off of uninitialized memory. There's really no way to "bound" undefined behavior so it's not really worth trying to.

nektro commented 11 months ago

agreed, if a user does not want to take on such a risk on the correctness of their program they should use ReleaseSafe instead

ziglang / zig

Unreachable code paths can fall through into unrelated functions #8901