llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.98k stars 11.95k forks source link

incorrect transformation around icmp: unclear semantics of "lifetime" intrinsics leads to miscompilation #45725

Open RalfJung opened 4 years ago

RalfJung commented 4 years ago
Bugzilla Link 46380
Version trunk
OS Linux
CC @comex,@efriedma-quic,@hfinkel,@jdoerfert,@dobbelaj-snps,@aqjune,@LebedevRI,@Meinersbur,@nunoplopes,@rotateright

Extended Description

(Bugzilla made me pick a component, so I made a wild guess. I do not have the slightest idea which of these internal LLVM components is responsible here. Would be nice if I could select "unknown"...)

In llvm/llvm-project#33896 #c99, Eli Friedman wrote

icmp is defined to just take the raw pointer bits as an integer. If some transform isn't consistent with this, please file a bug.

I think Juneyoung found a transformation that is indeed not consistent with this, which I adjusted as follows (https://godbolt.org/z/XYQ7Vx):

define i1 @compare(i32* %p, i32* %q) {
  %c = icmp eq i32* %p, %q
  ret i1 %c
}
define void @src() {
  %p = alloca i32
  %q = alloca i32
  call void @llvm.lifetime.start.p0i32(i64 1, i32* %p)
  call void @llvm.lifetime.end.p0i32(i64 1, i32* %p)
  call void @llvm.lifetime.start.p0i32(i64 1, i32* %q)
  %c = call i1 @compare(i32* %p, i32* %q)
  br i1 %c, label %A, label %B
A: ; compare() == true
  call void @f(i1 true)
  %c2 = icmp eq i32* %p, %q
  call void @f(i1 %c2)
  br label %EXIT
B: ; compare() == false
  call void @f(i1 false)
  %c3 = icmp eq i32* %p, %q
  call void @f(i1 %c3)
  br label %EXIT
EXIT:
  call void @llvm.lifetime.end.p0i32(i64 1, i32* %q)
  ret void
}

The function "src" compares "p" and "q" twice, once inside "compare". It calls "f" twice with the two results of the comparison. The first comparison is passed via indirect information flow, i.e., the equivalent of "if p == q { f(true) } else { f(false) }" in Rust. "p" and "q" could be equal or not, so this function has two possible executions: either "f" gets called twice with "true" as argument, or it gets called twice with "false" as argument.

The transformed program (with "opt -instsimplify") is

define i1 @compare(i32* %p, i32* %q) {
  %c = icmp eq i32* %p, %q
  ret i1 %c
}
define void @src() {
  %p = alloca i32, align 4
  %q = alloca i32, align 4
  call void @llvm.lifetime.start.p0i32(i64 1, i32* %p)
  call void @llvm.lifetime.end.p0i32(i64 1, i32* %p)
  call void @llvm.lifetime.start.p0i32(i64 1, i32* %q)
  %c = call i1 @compare(i32* %p, i32* %q)
  br i1 %c, label %A, label %B
A:                                                ; preds = %0
  call void @f(i1 true)
  call void @f(i1 false)
  br label %EXIT
B:                                                ; preds = %0
  call void @f(i1 false)
  call void @f(i1 false)
  br label %EXIT
EXIT:                                             ; preds = %B, %A
  call void @llvm.lifetime.end.p0i32(i64 1, i32* %q)
  ret void
}

Notice how in block A, "f" gets called with two different values, which should be impossible because the original program only calls f with two times the same value.

dwblaikie commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#52570

Meinersbur commented 4 years ago

@​James raddr1 and raddr2 are both (long) integers, which should have fixed values at the point of comparison.

I don't think they are poison values, which would mean they are initially non-poison, but become poison when leaving the scop.

llvmbot commented 4 years ago

For Eli F example:

void z(long*);
int a() { 
    long raddr1, raddr2;
    {
       long r; z(&r); raddr1 = (long)&r;
    }
    {
       long r; z(&r); raddr2 = (long)&r;
    }
    return raddr1 == raddr2;
}

At the "return raddr1 == raddr2;" isn't both raddr1 and raddr2 undefined because r is out of scope.

So, comparison of two "undefined" is probably undefined also. So, shouldn't compiling this instead give a compiler warning, instead of an alias analysis problem?

RalfJung commented 4 years ago

All right, opened llvm/llvm-bugzilla-archive#47090 .

efriedma-quic commented 4 years ago

A Rust user found another case where comparison with icmp is non-deterministic: https://play.rust-lang.org/ ?version=stable&mode=release&edition=2018&gist=8affd2a65b02a35e16dbab4682cb88 86.

What this does is it take pointers to two functions that are marked as "unnamed", casts those pointers to integers, and then compares them twice -- once after passing them through a "black box" noinline function, and once immediately. The compiler seemingly optimized the latter comparison to "false", but later makes the two function's addresses identical.

If we describe the semantics of this program at the LLVM IR level, then I think we have to either accept that "icmp" can produce different results for the same input (it is non-deterministic), or the program has to have UB. But it's not UB to compare "unnamed function" pointers, right? My expectation was that unnamed function can end up having the same or different addresses depending on optimizer decisions etc, but whether it is one or the other has to be consistent throughout a single execution of the program.

You're right, whether two addresses are the same or different has to be chosen consistently. This is also why the merging transform on unnamed_addr is "one-way": you can merge two globals into one global, but you can't split one global into two. (I remember discussing this at some point for constants, but I can't seem to find it.)

This doesn't really come up in C: the only unnamed_addr globals with a user-accessible address are strings, and we fold identical strings in the frontend.

This is only loosely related to the alloca issue; please file a separate bug.

RalfJung commented 4 years ago

A Rust user found another case where comparison with icmp is non-deterministic: https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=8affd2a65b02a35e16dbab4682cb8886.

What this does is it take pointers to two functions that are marked as "unnamed", casts those pointers to integers, and then compares them twice -- once after passing them through a "black box" noinline function, and once immediately. The compiler seemingly optimized the latter comparison to "false", but later makes the two function's addresses identical.

If we describe the semantics of this program at the LLVM IR level, then I think we have to either accept that "icmp" can produce different results for the same input (it is non-deterministic), or the program has to have UB. But it's not UB to compare "unnamed function" pointers, right? My expectation was that unnamed function can end up having the same or different addresses depending on optimizer decisions etc, but whether it is one or the other has to be consistent throughout a single execution of the program.

nunoplopes commented 4 years ago

Hmm, interesting. Alive2 actually treats comparsions differently if there are lifetime intrinsics: https://alive2.llvm.org/ce/z/xWcu-2 .

In theory it doesn't (modulo bugs, of course).

In this case, since the 2nd alloca is only allocated after the first one is deallocated, the comparison may return either true or false non-deterministically (depending on where the 2nd alloca ends up). If you check, Alive2 is ok with optimizing that function to either return true or false.

If there are no lifetime intrinsics, then it's guaranteed that the two allocas must be in disjoint memory, hence the pointer comparison can only be folded to false.

efriedma-quic commented 4 years ago

Hmm, interesting. Alive2 actually treats comparsions differently if there are lifetime intrinsics: https://alive2.llvm.org/ce/z/xWcu-2 .

RalfJung commented 4 years ago

FYI: In Alive2, the semantics of alloca depends on whether it has lifetime intrisic uses at parsing time. :)

That is funny, this is exactly what Miri also does.^^

If alloca has any uses with lifetime intrinsics, it is truly allocated at lifetime.start. Before that, it is assigned a block id, but it does not occupy the space and accessing it is UB.

Interesting, Miri does not even reserve a block ID. Each new lifetime.start generates a fresh block ID. This models that we want pointers to be invalidated when the lifetime of a local ends, even if the same local becomes live again later. (Compilation from that to Alive2's semantics seems fine, it just restricts allocator choice.)

aqjune commented 4 years ago

For more details which might be related with the github link:

aqjune commented 4 years ago

FYI: In Alive2, the semantics of alloca depends on whether it has lifetime intrisic uses at parsing time. :) If alloca has any uses with lifetime intrinsics, it is truly allocated at lifetime.start. Before that, it is assigned a block id, but it does not occupy the space and accessing it is UB. If alloca is not used by any lifetime intrinsics, it is immediately allocated at the definition. This has worked pretty well so far while testing with LLVM unit tests and running -O3 from a few agglomerated C single programs.

RalfJung commented 4 years ago

(There have been posts but no notifications.)

RalfJung commented 4 years ago

(More email errors.)

RalfJung commented 4 years ago

(I got a sendmail error from the web frontend, re-triggering notifications.)

RalfJung commented 4 years ago

To be clear, I only mean allocas which are live at the same time; obviously later function calls can reuse the same stack.

These are not live at the same time though? I do not recall the exact wording, but as far as I know C does not require local variables to be live for the entire function; when they are declared in a sub-scope like in your example, they can be deallocated earlier.

So given that LLVM encodes these source-level scopes into its IR using lifetime.start/end, it seems odd to say that allocas are "freed implicitly when the function returns". It seems more useful, and closer to C, to instead say they are "freed implicitly when their lifetime ends". This also reflects what stack coloring is doing.

I think it's implied by "allocating" memory; two allocations can't overlap.
This roughly matches the semantics of the languages we're trying to model.

In my mental model, an "alloca" is only really allocated once it becomes live (lifetime.start), and it gets deallocated when its live range ends. LLVM does not have "scopes"/"blocks" like C or many other source languages do, so what should those languages do when they want stack variables to be allocated only when their scope begins, and deallocated when their scope ends (which I think is common)? They add lifetime.start/end.

(There is a special case for alloca that don't have lifetime intrinsics associated with those; those are indeed live until the function returns. This does create some nasty corner cases, which has been the subject of discussion in Rust for quite a while, but progress has been hard as it is not very clear what exactly LLVM's requirements are. See e.g. https://github.com/rust-lang/rust/issues/42371.)

efriedma-quic commented 4 years ago

Oh, that is surprising. Why would LLVM explicitly say that about alloca and not about malloc?

malloc'ed memory is allocated until you call free(), according to the C standard. This is a similar thing, except that it's freed implicitly when the function returns. To be clear, I only mean allocas which are live at the same time; obviously later function calls can reuse the same stack.

(For dynamic allocas, there's also stacksave/stackrestore; a stackrestore also frees memory. But those are pretty rare in most code.)

according to the LangRef, two allocas can't have the same address. All I could find in the LangRef is Allocating zero bytes is legal, but the returned pointer may not be unique. But this doesn't say that otherwise, it is unique.

I think it's implied by "allocating" memory; two allocations can't overlap. This roughly matches the semantics of the languages we're trying to model.

I wonder if it would be possible to adjust lifetime.end semantics to say that this may (or may not) free the memory for that alloca, to permit reusing addresses.

Maybe. It would be pretty inconvenient to reason about, since the uses of the memory aren't tied to the actual allocation. And it would be difficult to describe the rules in an intuitive way. But I can't think of any showstoppers.

RalfJung commented 4 years ago

I think the issue you're talking about is something like the following?

Not sure if "you" is me here, but I agree that is a similar example -- z could observe the two addresses and so we have a contradiction, like in my example.

(The problem is just the contradiction.)

Agreed: whether or not the addresses are equal is non-deterministic. That's why I said my example has two possible executions. It's just, "f(true); f(false);" is not one of them.

raddr1 and raddr2 are integers, not pointers; the rule in question doesn't apply.

The rule in question is a C++ rule, so it applies to Eli's example, but AFAIK LLVM has no such rule and thus it does not apply to my example. Is that not right?

If you're talking about the semantics of LLVM, according to the LangRef, two allocas can't have the same address. The rules for alloca specify that the memory is not freed until the function returns. And the rules for llvm.lifetime.start/end don't say anything about the address, only that accessing the memory is undefined outside the lifetime. Therefore, it's illegal for stack coloring to allocate two variables whose address escapes at the same address.

Oh, that is surprising. Why would LLVM explicitly say that about alloca and not about malloc? And what about popping and then pushing the same stack frame, so that a stack slot ends up at the same physical address -- that would also be two allocas with the same address (just different in provenance, similar to malloc reusing memory), why is that not a problem?

I thought the entire point of these lifetime annotations was to let LLVM re-use the same stack slots for different variables, and this rule seems to make that unnecessarily hard.

according to the LangRef, two allocas can't have the same address. All I could find in the LangRef is Allocating zero bytes is legal, but the returned pointer may not be unique. But this doesn't say that otherwise, it is unique.

I wonder if it would be possible to adjust lifetime.end semantics to say that this may (or may not) free the memory for that alloca, to permit reusing addresses.

efriedma-quic commented 4 years ago

=

raddr1 and raddr2 are pointers, not integers; the rule in question doesn't apply.

Somehow got this backwards; meant to say "integers, not pointers".

efriedma-quic commented 4 years ago

The alloca have disjoint lifetimes. That means LLVM may actually allocate them to the same stack slot.

Correct, I missed that. However, I wonder why/whether it is legal to shorten the lifetime even though the pointers are still used for the compare call.

If you're talking about C/C++ variables, the lifetime is defined by the standard. In my example, the two variables aren't live at the same time, so the addresses may or may not be numerically equal. (The problem is just the contradiction.) If two variables are live at the same time, the addresses must not be numerically equal.

If you're talking about the semantics of LLVM, according to the LangRef, two allocas can't have the same address. The rules for alloca specify that the memory is not freed until the function returns. And the rules for llvm.lifetime.start/end don't say anything about the address, only that accessing the memory is undefined outside the lifetime. Therefore, it's illegal for stack coloring to allocate two variables whose address escapes at the same address.

Obviously stack coloring does in fact allocate variables at the same address. And we can't fix it to match LangRef without causing a significant regression for C++ code.

I case of Eli's example, I think C++20 justifies it in http://eel.is/c++draft/basic.stc#4:

raddr1 and raddr2 are pointers, not integers; the rule in question doesn't apply.

Meinersbur commented 4 years ago

The alloca have disjoint lifetimes. That means LLVM may actually allocate them to the same stack slot.

Correct, I missed that. However, I wonder why/whether it is legal to shorten the lifetime even though the pointers are still used for the compare call.

I case of Eli's example, I think C++20 justifies it in http://eel.is/c++draft/basic.stc#4:

When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of that region of storage become invalid pointer values. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.

Note that all compilers optimize the function even without the ptr2int cast: https://godbolt.org/z/MDp3Ut

efriedma-quic commented 4 years ago

I think the issue you're talking about is something like the following?

void z(long*);
int a() { 
    long raddr1, raddr2;
    {
       long r; z(&r); raddr1 = (long)&r;
    }
    {
       long r; z(&r); raddr2 = (long)&r;
    }
    return raddr1 == raddr2;
}

clangs allocate the two versions of "r" at the same address, but folds the compare to "false".

This isn't really an integer icmp vs. pointer icmp thing; it's more of a "our representation of local variable lifetimes is wrong" thing.

Interestingly, both gcc and msvc do the same thing.

RalfJung commented 4 years ago

The alloca have disjoint lifetimes. That means LLVM may actually allocate them to the same stack slot.

Meinersbur commented 4 years ago

Since %p and %q are different allocas, they will have two different values, including different integer representations (both alloca are 4 bytes).

InstCombine is making the deduction

%p = alloca i32, align 4 %q = alloca i32, align 4 %c2 = icmp eq i32* %p, %q

to

%c2 = false

because of this. It does not change the compare function because InstCombine is not interprocedural.

The basic block A is dead code, it will never be executed. Dead code can be arbitrarily wrong.

RalfJung commented 1 year ago

FWIW this is now tracked as a soundness issue on the Rust side, since the miscompilations resulting from this can break language guarantees, even in safe code: https://github.com/rust-lang/rust/issues/107975.

Is there any agreed-upon plan for how to resolve these inconsistencies in LLVM?

DoubleHyphen commented 1 year ago

FWIW this is now tracked as a soundness issue on the Rust side, since the miscompilations resulting from this can break language guarantees, even in safe code: rust-lang/rust#107975.

Is there any agreed-upon plan for how to resolve these inconsistencies in LLVM?

Between nikic throwing in the towel, and the utter silence in this thread for the past two-and-a-half years, I'll hazard a guess that the answer is “no”.

efriedma-quic commented 1 year ago

For local variable allocations, we want to ensure:

The problems with the current "lifetime" intrinsics:

I don't think there's a good way to retrofit reasonable semantics onto llvm.lifetime.start and llvm.lifetime.end; there's a significant mismatch between the way "alloca" is defined, and how we really want to allocate memory on the stack.

I think the answer here is to introduce new intrinsics. There are a couple of ways you could go about it. But probably, it's conceptually simplest to just have intrinsics that have semantics similar to "malloc" and "free", with the restriction that the allocations have to be properly nested. I'm not going to try to formally define what exactly "properly nested" means, but it's comparable to llvm.call.preallocated.setup. Each allocation uniquely happens at one point in the function, and you have to free allocations in the reverse order they were allocated. Lowering from that form to a static stack allocation is then a straightforward walk through the function: you "push" an entry on the stack when you see an alloc, and "pop" it when you see a free. (Haven't thought through how this interacts with exception handling; probably landing pads need some sort of annotation.)

Once we have that, it should be straightforward to implement the correct rule: "icmp eq" of with two allocations with overlapping lifetime folds to false, and "icmp eq" with two allocations with non-overlapping lifetimes doesn't fold.

Most optimizations wouldn't need to be aware the intrinsics are different than just "malloc" and "free"; probably optimizations that currently check for "token" need to be aware.

This representation also has some other minor benefits: alias analysis is more accurate because passes like the inliner don't introduce multiple variables using the same allocation, and we aren't so dependent on "coloring" to get stack usage proportional to what the user intuitively expects.

(I don't expect I'll have time to try to implement this.)

RalfJung commented 1 year ago

Here's another problem with the "lifetime" intrinsics: https://github.com/llvm/llvm-project/issues/51838.

andres-erbsen commented 1 year ago

I believe this issue is responsible for the following end-to-end miscompilation for C11: https://godbolt.org/z/vfTE1Kv4c

#include <stdint.h>

#if defined(__GNUC__)
#define OPTIMIZATION_BARRIER(x) __asm__("" : "+r"(x))
#elif defined(_M_IX86)
#define OPTIMIZATION_BARRIER(x) __asm add x, 0
#else
#define OPTIMIZATION_BARRIER(x)
#endif

int main() {
  uintptr_t x, y;
  { int a[2]; x = (uintptr_t)a; }
  { int b[2]; y = (uintptr_t)b; }
  uintptr_t equal = x == y; // 0
  OPTIMIZATION_BARRIER(equal);
  uintptr_t diff = x-y; // 0
  return equal == diff; // returns 1 with -O
}
Longer version without inline assembly, which exhibits the issue even if all non-portable constructs are removed: https://godbolt.org/z/a4Yz6Tdn7 ```c #include struct pair { uintptr_t x; uintptr_t y; }; static inline struct pair equal_integers_from_different_allocations(void) { struct pair p = {0, 0}; { int a[2] = {0, 0}; p.x = (uintptr_t)a; } { int b[2] = {0, 0}; p.y = (uintptr_t)b; } return p; } int nonequality(void) { struct pair p = equal_integers_from_different_allocations(); return p.x != p.y; // 1 } int difference(void) { struct pair p = equal_integers_from_different_allocations(); return p.x - p.y; // 0 } #if defined(__clang__) __attribute__((optnone)) #elif defined(__GNUC__) __attribute__((noipa)) #elif defined(_MSC_VER) __declspec(noinline) #endif int launch_missiles() { return 1; } static inline int equal(uintptr_t x, uintptr_t y) { return x-y == y-x && x-y <= -(uintptr_t)1/2; // 1 iff x == y, but weird } int main(void) { struct pair p = equal_integers_from_different_allocations(); if (p.x != p.y && equal(p.x, p.y)) { return launch_missiles(); } // "For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 [...]" // "Two values (other than NaNs) with the same object representation compare equal, but values that compare equal may have different object representations." // https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf } ```

I am moderately interested in getting this issue fixed. I want to be able to honestly claim that my tooling for formal proofs about C programs is sound with respect to how real-world compilers execute these programs, and allocation patterns breaking integer arithmetic is beyond what I can handle. (Note that the pointer-provenance discuss does not change the desired behavior here, both PVI and PNVI-* have numeric results of integer operations be independent of provenance.)

andres-erbsen commented 1 year ago

The proposal in https://github.com/llvm/llvm-project/issues/45725#issuecomment-1537597333 sounds like a good idea, but I am not sure whether it is the right fix for specifically this issue. I believe icmp currently folds on malloc-returned pointers even with nonoverlapping lifetimes -- I presume with the justification that comparing two pointers after free would be illegal. I also believe icmp(ptrtoint, ptrtoint) gets folded as icmp of the underlying pointers, indiscriminately. This means there would likely be a case where the last rule creates a compare-after-free on malloced addresses (EDIT: I hear that the last transformation fine in llvm and malloc is not affected).

See also https://github.com/rust-lang/rust/issues/107975#issuecomment-1441506795. Issue https://github.com/llvm/llvm-project/issues/54002 might be related as well.

As an alternative proposal, it would be sound for icmp(ptrtoint, ptrtoint) to never fold. Is there an established way to evaluate whether this would cause important optimizations to be missed? EDIT: this would help C, but not Rust.

RalfJung commented 1 year ago

I presume with the justification that comparing two pointers after free would be illegal

That is illegal in C, but AFAIK it is not documented as illegal in LLVM. It is legal in Rust, so if LLVM performs such folds without documenting its assumption that would be a major problem.

The code you link has a comment saying "Different non-empty allocations that exist at the same time have different addresses" (emphasis mine). It would assume this means that the transformation only applies when the lifetimes of the allocations overlap?

RalfJung commented 1 year ago

I also believe icmp(ptrtoint, ptrtoint) gets folded as icmp of the underlying pointers, indiscriminately.

Yes, but this is almost fine -- icmp on pointers ignores provenance. At least, that is my understanding of the intended LLVM semantics.

(I am saying "almost" because removing ptrtoint can be wrong even if the result is not used -- ptrtoint has side-effects.)

andres-erbsen commented 1 year ago

Different non-empty allocations that exist at the same time have different addresses

https://github.com/llvm/llvm-project/issues/45725#issuecomment-1643844201

That's what the comment says, but I don't see where that condition is actually checked; and the comment https://github.com/llvm/llvm-project/blob/0db5d8e12384d780d2ded6b972233dc8b1632560/llvm/lib/Analysis/InstructionSimplify.cpp#L2672-L2673 makes me think that maybe it isn't for alloca's.

Thank you for the clarifications about llvm and rust semantics.

RalfJung commented 1 year ago

Yeah for alloca LLVM is internally inconsistent, as discussed above:

This can lead to arbitrary nonsense even in safe Rust code (since as you say, it's just a bunch of integer arithmetic), so I am not surprised that C code verified/generated by your tool is also affected.

Your post mentioned "malloc" and it would be news to me that those are also affected, but who knows. (Now it has been edited so I guess that was a mistake.)

andres-erbsen commented 1 year ago

Thank you for the summary, it makes sense now. I agree that the reasoning based on which I suspected malloc having the same issue was wrong (I made the edit even clearer now). I also confirmed that a direct analog to my example does not break with malloc.

Based on last comment I tried compiling with -Xclang -disable-lifetime-markers and it seems to work around the issue; putting an optimization barrier around the ptrtoint also works.

Do you think the status of this issue is clear enough to say something like "llvm developers accept that this behavior is a compiler bug", or might we be in for another rabbithole to the tune of strict aliasing and pointer provenance?

RalfJung commented 1 year ago

Nobody suggested it's not a bug, and this comment is fairly clear:

whether two addresses are the same or different has to be chosen consistently

I'll take the liberty to remove the question mark from the title.

It's just a hard bug to fix, from what I understand. Also see this comment.