Open nunoplopes opened 7 years ago
Currently, in LLVM, pointers are clearly "magic", as in, icmp eq i8* %x, %y does not imply %x and %y are equivalent; there's also an invisible base pointer attached to any value of pointer type. This is reflected in a lot of places. noalias only makes sense with "magic" pointers (since otherwise, it's impossible to prove any relationship between the value and the marking). inbounds only makes sense with "magic" pointers (otherwise, you basically can't use it to prove anything). And alias analysis heavily depends on the assumption that it can ignore offsets in GEPs.
The testcase in comment 49 is related, in that the "llvm.invariant.group.barrier" intrinsic depends on "magic" pointers: LangRef says the returned value has different aliasing semantics from the argument, even though it has the same value.
This isn't to say that the rules for how exactly base pointers work are particularly clear, or even self-consistent, but whatever the exact rules are, they clearly aren't consistent with GVN generating equivalences for pointers based solely on the result of an icmp.
Note that the CSEed load is tagged with the same invariant.group as the available load expression -- even though there is a dominating invariant.barrier.
Removing that !invariant.group annotation in the IR, GVN produces correct result.
It kinda doesn't matter. Eli is right in the sense that there is implicit info in pointers right now that can generate wrong things in the face of equivalences. Even if you fix the clear bug you point out, you could still generate a testcase where gvn does the wrong thing :)
Any small example test case about how the magic pointer can break things? Or is it similar to the one in comment #39?
Put it this way. In this case, if the 'invariant' annotation is removed, the optimizer will behave correctly -- this means that either IR is wrong, or the way how invariant groups are handled is wrong. For reference, here is the complete IR test case (build it with -gvn -S)
struct.A = type { i32 (...)** } %struct.B = type { %struct.A }
@_ZTV1A = available_externally unnamed_addr constant { [3 x i8] } { [3 x i8] [i8 null, i8 bitcast (i8* @_ZTI1A to i8), i8 bitcast (void (%struct.A) @_ZN1A3fooEv to i8)] }, align 8 @_ZTV1B = available_externally unnamed_addr constant { [3 x i8] } { [3 x i8] [i8 null, i8 bitcast (i8* @_ZTI1B to i8), i8 bitcast (void (%struct.B) @_ZN1B3fooEv to i8)] }, align 8 @_ZTI1A = external constant i8 @_ZTI1B = external constant i8
;@_ZTV1A = external unnamed_addr constant { [3 x i8] }, align 8 ;@_ZTV1B = external unnamed_addr constant { [3 x i8] }, align 8
define void @_Z5weirdv() local_unnamed_addr #0 { %1 = tail call i8 @_Znwm(i64 8) #5 %2 = bitcast i8 %1 to %struct.A tail call void @_ZN1AC1Ev(%struct.A nonnull %2) #3 %3 = bitcast i8* %1 to i8* %4 = load i8*, i8 %3, align 8, !tbaa !5, !invariant.group !8 %5 = icmp eq i8 %4, getelementptr inbounds ({ [3 x i8] }, { [3 x i8] }* @_ZTV1A, i64 0, inrange i32 0, i64 2) tail call void @llvm.assume(i1 %5) %6 = bitcast i8* %4 to void (%struct.A) %7 = load void (%struct.A), void (%struct.A*) %6, align 8, !invariant.load !8 tail call void %7(%struct.A nonnull %2) #3 %8 = tail call i8 @llvm.invariant.group.barrier(i8 nonnull %1) %9 = bitcast i8 %8 to %struct.B tail call void @_ZN1BC1Ev(%struct.B %9) #3 %10 = bitcast i8 %8 to i8 %11 = load i8**, i8 %10, align 8, !tbaa !5, !invariant.group !8 %12 = icmp eq i8 %11, getelementptr inbounds ({ [3 x i8] }, { [3 x i8] } @_ZTV1B, i64 0, inrange i32 0, i64 2) tail call void @llvm.assume(i1 %12) %13 = icmp eq i8* %1, %8 br i1 %13, label %14, label %19
;
;
; Function Attrs: nobuiltin declare noalias nonnull i8* @_Znwm(i64) local_unnamed_addr #1
declare void @_ZN1AC1Ev(%struct.A*) unnamed_addr #2
; Function Attrs: nounwind declare void @llvm.assume(i1) #3
; Function Attrs: argmemonly nounwind readonly declare i8 @llvm.invariant.group.barrier(i8) #4
declare void @_ZN1BC1Ev(%struct.B*) unnamed_addr #2
declare void @_ZN1A3fooEv(%struct.A) unnamed_addr #2 declare void @_ZN1B3fooEv(%struct.B) unnamed_addr #2
attributes #0 = { nounwind uwtable} attributes #1 = { nobuiltin } attributes #2 = { nounwind } attributes #3 = { nounwind } attributes #4 = { argmemonly nounwind readonly } attributes #5 = { builtin nounwind }
!llvm.module.flags = !{#0, !1, !3} !llvm.ident = !{#4}
!0 = !{i32 1, !"StrictVTablePointers", i32 1} !1 = !{i32 3, !"StrictVTablePointersRequirement", !2} !2 = !{!"StrictVTablePointers", i32 1} !3 = !{i32 1, !"wchar_size", i32 4} !4 = !{!"clang version 6.0.0 (trunk 312928)"} !5 = !{#6, !6, i64 0} !6 = !{!"vtable pointer", !7, i64 0} !7 = !{!"Simple C++ TBAA"} !8 = !{}
Same with CVP, anything using LVI, etc. It's just "harder" to get them to optimize it.
You'll note that GCC is able to propagate equivalences and get roughly all of these right. This is in part because MemorySSA in GCC being there all the time makes it easy to express the aliasing constraints in a way that nothing breaks them. But this is just convenience, we could achieve the same goal without memoryssa all the time if we wanted to, by having pointers carry along their magic instead of it being implicit. Then only a thing that understands the magic can propagate the tokens.
I suspect such a change would be larger than making memoryssa exist everywhere though.
(this eliminates the class of issues with using pointer values, but leaves the class of issues with pointer-in-integer form values).
The larger problem with the original testcase is, even if you went as far as to introduce a pointercompare instruction, and use it on uintptr_t and real pointer types, and never get equivalences from pointercompare's, it wouldn't help unless you told people "this will only work with visible uintptr_t's, otherwise, you are boned".
Currently, in LLVM, pointers are clearly "magic", as in, icmp eq i8* %x, %y does not imply %x and %y are equivalent; there's also an invisible base pointer attached to any value of pointer type. This is reflected in a lot of places. noalias only makes sense with "magic" pointers (since otherwise, it's impossible to prove any relationship between the value and the marking). inbounds only makes sense with "magic" pointers (otherwise, you basically can't use it to prove anything). And alias analysis heavily depends on the assumption that it can ignore offsets in GEPs.
The testcase in comment 49 is related, in that the "llvm.invariant.group.barrier" intrinsic depends on "magic" pointers: LangRef says the returned value has different aliasing semantics from the argument, even though it has the same value.
This isn't to say that the rules for how exactly base pointers work are particularly clear, or even self-consistent, but whatever the exact rules are, they clearly aren't consistent with GVN generating equivalences for pointers based solely on the result of an icmp.
Note that the CSEed load is tagged with the same invariant.group as the available load expression -- even though there is a dominating invariant.barrier.
Removing that !invariant.group annotation in the IR, GVN produces correct result.
It kinda doesn't matter. Eli is right in the sense that there is implicit info in pointers right now that can generate wrong things in the face of equivalences. Even if you fix the clear bug you point out, you could still generate a testcase where gvn does the wrong thing :)
Same with CVP, anything using LVI, etc. It's just "harder" to get them to optimize it.
You'll note that GCC is able to propagate equivalences and get roughly all of these right. This is in part because MemorySSA in GCC being there all the time makes it easy to express the aliasing constraints in a way that nothing breaks them. But this is just convenience, we could achieve the same goal without memoryssa all the time if we wanted to, by having pointers carry along their magic instead of it being implicit. Then only a thing that understands the magic can propagate the tokens.
I suspect such a change would be larger than making memoryssa exist everywhere though.
(this eliminates the class of issues with using pointer values, but leaves the class of issues with pointer-in-integer form values).
The larger problem with the original testcase is, even if you went as far as to introduce a pointercompare instruction, and use it on uintptr_t and real pointer types, and never get equivalences from pointercompare's, it wouldn't help unless you told people "this will only work with visible uintptr_t's, otherwise, you are boned".
Currently, in LLVM, pointers are clearly "magic", as in, icmp eq i8* %x, %y does not imply %x and %y are equivalent; there's also an invisible base pointer attached to any value of pointer type. This is reflected in a lot of places. noalias only makes sense with "magic" pointers (since otherwise, it's impossible to prove any relationship between the value and the marking). inbounds only makes sense with "magic" pointers (otherwise, you basically can't use it to prove anything). And alias analysis heavily depends on the assumption that it can ignore offsets in GEPs.
The testcase in comment 49 is related, in that the "llvm.invariant.group.barrier" intrinsic depends on "magic" pointers: LangRef says the returned value has different aliasing semantics from the argument, even though it has the same value.
This isn't to say that the rules for how exactly base pointers work are particularly clear, or even self-consistent, but whatever the exact rules are, they clearly aren't consistent with GVN generating equivalences for pointers based solely on the result of an icmp.
Note that the CSEed load is tagged with the same invariant.group as the available load expression -- even though there is a dominating invariant.barrier.
Removing that !invariant.group annotation in the IR, GVN produces correct result.
(and of course, literally nothing you do with pointers in piotr's testcase will fix anything related to this particular bug, since it's about generating equivalences from what look like perfectly normal integers)
"pointers are only equivalent if they have the same name and token. "
same value and token, obviously
Currently, in LLVM, pointers are clearly "magic", as in, icmp eq i8* %x, %y does not imply %x and %y are equivalent;
It implies they are value equivalent. It does not imply they are aliasing equivalent.
there's also an invisible base pointer attached to any value of pointer type. This is reflected in a lot of places.
And is pretty easy to generate testcases that break each of these in interesting and different ways.
I believe Zhendong has proven this at this point.
noalias only makes sense with "magic" pointers (since otherwise, it's impossible to prove any relationship between the value and the marking).
You could just make it explicit, actually. Instead of an invisible/implicit thing, you can pass along a token.
pointers are only equivalent if they have the same name and token.
inbounds only makes sense with "magic" pointers (otherwise, you basically can't use it to prove anything). And alias analysis heavily depends on the assumption that it can ignore offsets in GEPs.
The testcase in comment 49 is related, in that the "llvm.invariant.group.barrier" intrinsic depends on "magic" pointers: LangRef says the returned value has different aliasing semantics from the argument, even though it has the same value.
This isn't to say that the rules for how exactly base pointers work are particularly clear, or even self-consistent, but whatever the exact rules are, they clearly aren't consistent with GVN generating equivalences for pointers based solely on the result of an icmp.
As mentioned, it's not just GVN. It's just easy to generate a testcase GVN screws up.
We really need to start getting out of "magic" mode. You aren't going to be able to hack away all this stuff by, for example, disabling pointer equivalences in GVN.
Because in turn, simplifyinstruction does this on icmp's of pointers " // Simplify comparisons of related pointers using a powerful, recursive // GEP-walk when we have target data available."
while certainly, the calls in there try to be conservative to handle these cases, the point is more: building on top of a house of cards and hoping it doesn't do a certain thing is a bad play. At some point, you need to start building something that isn't a house of cards.
The best way i know how to do that is to make the implicit, explicit.
Currently, in LLVM, pointers are clearly "magic", as in, icmp eq i8* %x, %y does not imply %x and %y are equivalent; there's also an invisible base pointer attached to any value of pointer type. This is reflected in a lot of places. noalias only makes sense with "magic" pointers (since otherwise, it's impossible to prove any relationship between the value and the marking). inbounds only makes sense with "magic" pointers (otherwise, you basically can't use it to prove anything). And alias analysis heavily depends on the assumption that it can ignore offsets in GEPs.
The testcase in comment 49 is related, in that the "llvm.invariant.group.barrier" intrinsic depends on "magic" pointers: LangRef says the returned value has different aliasing semantics from the argument, even though it has the same value.
This isn't to say that the rules for how exactly base pointers work are particularly clear, or even self-consistent, but whatever the exact rules are, they clearly aren't consistent with GVN generating equivalences for pointers based solely on the result of an icmp.
as I talked with Reid and Richard, it seems that the devirtualization has the same problem right now:
include
struct A { A(); virtual void foo(); };
struct B : A{ B(); virtual void foo(); };
void weird() { A a = new A; a->foo(); A b = new(a) B; if (a == b) b->foo(); }
Because gvn will replace b with a, it will cause the wron call to be called.
This is not the same bug at all.
This is about where the lifetime of memory ends, and how that is expressed in the IR. The others are about the semantics of ptr2int and int2ptr.
Here, if your IR lets GVN do this, your IR is buggy. It should either be reloading the vptr before the b->foo() call, or it should be expressing that "A" is dead (lifetime.start/end are optional, but if you used them, they would be required here for correctness. So the semantics would have to be modified, or new intrinsics developed).
My question is: how important is replecement of one variable to another (non constant) based on comparision? Right now GVN just picks the variable having longer lifetime.
It's not about importance. You can't fix this by telling GVN to stop replacing perfectly legal equivalences. You have to make IR where it will conclude they aren't equivalent at this point in the first place.
I think this needs a tremendous amount of emphasis. As our optimizations get more powerful, this will simply show up in more and more ways. Whatever solution chosen has to express, in IR, such that it is obvious what is legal and what is not.
GVN, NewGVN, etc, depend on all sorts of underlying infrastructure that may also enable it to discover the same thing. It already does. simplifyinstruction, for example, will use computeknownbits and proving things about icmps (IE trying to create an icmp and see if it simplifies) of your variables in ways that will do the same thing as gvn is doing here.
Trying to neuter optimizations at point-of-elimination is, IMHO, a recipe for disaster. You have to make all things able to conclude it's not equivalent in the first place. The only way you can do that is by expressing it in the IR.
as I talked with Reid and Richard, it seems that the devirtualization has the same problem right now:
include
struct A { A(); virtual void foo(); };
struct B : A{ B(); virtual void foo(); };
void weird() { A a = new A; a->foo(); A b = new(a) B; if (a == b) b->foo(); }
Because gvn will replace b with a, it will cause the wron call to be called.
My question is: how important is replecement of one variable to another (non constant) based on comparision? Right now GVN just picks the variable having longer lifetime.
If the solution to this is to ignore the problem, then there should not be objections to turn on devirtualization by default (as it probably have the same odds of miscompiling)
Still looks like a AA problem. a->vptr field should be clobbered by the placement new call, so it should be reloaded before the second foo call.
as I talked with Reid and Richard, it seems that the devirtualization has the same problem right now:
struct A { A(); virtual void foo(); };
struct B : A{ B(); virtual void foo(); };
void weird() { A a = new A; a->foo(); A b = new(a) B; if (a == b) b->foo(); }
Because gvn will replace b with a, it will cause the wron call to be called.
My question is: how important is replecement of one variable to another (non constant) based on comparision? Right now GVN just picks the variable having longer lifetime.
If the solution to this is to ignore the problem, then there should not be objections to turn on devirtualization by default (as it probably have the same odds of miscompiling)
(we currently define the result in terms of the result of ptrtoint on each operand).
That's precisely the problem though -- given what you said, the expression is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be safely transformed to "q := b".
Actually, I wonder if that's only one way to define this. The LangRef says that the comparisons are done as integers, but that does not necessarily mean that we need to define the action of the comparison directly in terms of ptrtoint. Maybe we'd like to define cmp in another way. There might be advantages to that - it would be closer, at least, to C/C++ comparisons, and we always can generate explicit ptrtoint instructions if we need those semantics.
I would be pretty strongly against making CSE less powerful :) Besides it being very difficult (there a bunch of things that go looking for these sorts of equalities besides earlycse, gvn, and newgvn). You'd also have to give up a lot, because it's not limited to pointer equalities, it's all non-floating point equalities. I suspect you'd be giving up a tremendous amount of performance to make a very small amount of very uncommon and complex code work ;) (and that, as mentioned, doesn't work in other compilers either).
I know from numbers that the equality propagation that instsimplify, instcombine, cse, and gvn provide is at least 10% of total perf on apps i looked at.
I'd rather us define semantics and a well lit path here for people to accomplish what they need, and declare victory.
Of the three ways we've discussed fixing this, my preference is for making CSE less powerful. This device has been used in many LLVM soundness counterexamples. It was an existential problem that we were never able to solve when designing de-virtualization during Piotr's (prazek@) internship a few years ago.
If LLVM has these kinds of "based-on" aliasing rules, then we can't use the fact that "p == q" to replace q with p or p with q. That equality has to do with numeric pointer values, which we should try to keep separate from our aliasing model.
I suspect that most of the value of CSE/equality propagation is from discovering that a variable is constant in some region of code (i.e. propagating null inside a null check), which probably doesn't suffer from these kinds of bugs where we replace one pointer with another valid object pointer without considering aliasing.
I don't think this is true, if I understand you correctly. For instance, if you could access one alloca by offsetting from another (I presume that's what you meant by "all locals can be accessed via the same base %rsp"), mem2reg would not be legal. E.g. mem2reg would not be able to transform
define i32 @f() { %a = alloca i32 %b = alloca i32 store i32 0, i32 %a call void @unknown(i32 %b) %v = load i32, i32* %a ret i32 %v }
to
define i32 @f() { %b = alloca i32 call void @unknown(i32* %b) ret i32 0 }
(which it does today) since @unknown //may have// written something other than 0 to %a by offsetting from %b.
I think we should not suppress optimizations, but instead fix the real underlying problems.
The underlying object being accessed is not changed regardless of what base pointer is used -- whether AA can handle it currently is a different story.
This ^ semantic is one correct interpretation of what happened -- since X and Y happen to lie adjacent to one another, using X+1 is a legitimate way to access Y (so the select xform did nothing wrong). This is close to the "flat" memory model I referred to earlier: pointers are just numerical values, how you got hold of a pointer does not matter as long as it has the right numerical value. However, this semantic is problematic because it disallows basic optimizations we'd like to be able to do, like mem2reg, but of course, that does not make the semantic incorrect, just inconvenient.
However, there are other semantics too with different trade-offs. For instance, one possible semantic is that X+1 is not a legitimate way to access Y, even if numerically X+1 is the same as Y; in other words pointers are more than just their numerical values. In this semantic mem2reg is legal because no one can create a pointer to an unescaped alloca "out of thin air" by offsetting from some other alloca they have access to. However, in this semantic, the select folding is wrong -- it miscompiled and introduced UB into a well defined program.
This is always the case at user level -- user can not assume layout relationship from declaration order. Compiler, on the other hand, can do it because it is also responsible for laying out the objects.
The tricky situation is that if user write code with runtime checks of relative positions of objects, when the condition is true, can compiler make such transformation that one object is accessed via base pointer of another object? I think most compiler does that -- but if AA is not maintained/updated, mis-compile can happen.
I'm sure there are other more creative semantics with more, should I say subtle :), tradeoffs. We just need to pick one and be consistent about it.
For instance, suppose we want to access locality of local objects, we will need to introduce an optimization pass that lays out stack object in an optimal way. After this is done, all the local objects will now have a fixed offset from the same base. AA needs to be taught to handle such scenarios (e.g. annotate the accesses with unique local ids).
Are you talking about a pass that replaces all alloca's with offsets into a giant "master" alloca?
Yes, not just that. It is more common to do this for global variable layouts. Global variables can be laid out for better cache utilization; it also help reduce the number of GOT entries and GOT accesses if global accesses are commonned to the same base.
If so, in the flat-memory-model semantic this is a no-op, but in the nonflat-memory-model semantic this is an information losing transform. But this is correct in both the semantics.
Whether the information is lost or not depends on the implementation -- it is possible to have the AA info not lost
I don't think this is true, if I understand you correctly. For instance, if you could access one alloca by offsetting from another (I presume that's what you meant by "all locals can be accessed via the same base %rsp"), mem2reg would not be legal. E.g. mem2reg would not be able to transform
define i32 @f() { %a = alloca i32 %b = alloca i32 store i32 0, i32 %a call void @unknown(i32 %b) %v = load i32, i32* %a ret i32 %v }
to
define i32 @f() { %b = alloca i32 call void @unknown(i32* %b) ret i32 0 }
(which it does today) since @unknown //may have// written something other than 0 to %a by offsetting from %b.
I think we should not suppress optimizations, but instead fix the real underlying problems.
The underlying object being accessed is not changed regardless of what base pointer is used -- whether AA can handle it currently is a different story.
This ^ semantic is one correct interpretation of what happened -- since X and Y happen to lie adjacent to one another, using X+1 is a legitimate way to access Y (so the select xform did nothing wrong). This is close to the "flat" memory model I referred to earlier: pointers are just numerical values, how you got hold of a pointer does not matter as long as it has the right numerical value. However, this semantic is problematic because it disallows basic optimizations we'd like to be able to do, like mem2reg, but of course, that does not make the semantic incorrect, just inconvenient.
However, there are other semantics too with different trade-offs. For instance, one possible semantic is that X+1 is not a legitimate way to access Y, even if numerically X+1 is the same as Y; in other words pointers are more than just their numerical values. In this semantic mem2reg is legal because no one can create a pointer to an unescaped alloca "out of thin air" by offsetting from some other alloca they have access to. However, in this semantic, the select folding is wrong -- it miscompiled and introduced UB into a well defined program.
I'm sure there are other more creative semantics with more, should I say subtle :), tradeoffs. We just need to pick one and be consistent about it.
For instance, suppose we want to access locality of local objects, we will need to introduce an optimization pass that lays out stack object in an optimal way. After this is done, all the local objects will now have a fixed offset from the same base. AA needs to be taught to handle such scenarios (e.g. annotate the accesses with unique local ids).
Are you talking about a pass that replaces all alloca's with offsets into a giant "master" alloca? If so, in the flat-memory-model semantic this is a no-op, but in the nonflat-memory-model semantic this is an information losing transform. But this is correct in both the semantics.
I think Hal nails the problem: both data-flow and control-flow may contribute to the value of the value of ptr2int. This is because integers can be freely swapped (e.g. by GVN) and so data-flow dependencies can be lost.
Therefore a pointer produced by int2ptr may alias with other objects rather than only the one found through data-flow dependencies. That's why int2ptr(ptr2int(x)) cannot always be replaced with x -- it certainly can in some cases.
BTW, regarding gcc, the only bug we were able to find was in AA with != comparisons only: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82177
I wanted to point out this issue has nothing to do with inttoptr. I can reproduce the problem with a program that does not use any of intptr/ptrint:
Fundamentally, what happens is that with address comparisons optimization pass(es) convert the program which is semantically equivalent but changes the underlying object of what a type points to. This greatly confuses AA (which assumes out of bound pointer does not point to neighboring stack objects or unknown pointer can not point to non-escaped locals etc) and leads to latter optimization pass to do bad things.
Capturing the alias relationship with predicate/assertion propagation is the way to solve it as pointed out by Danny, but there are just too many situations to handle to make it worthwhile. We can however special case the one-pass-the-end situation which is not uncommon IMO.
// program:
include
void f(int, int);
int main() { int a=0, x = 0, y[1]; // int a=0, y[1], x =0 ; int pi = &x; int yi = &y[1]; bool n = pi != yi;
if (n) { a = 100; pi = yi; }
if (n) { a = 100; pi = &y[0]; }
*pi = 15;
printf("a=%d x=%d\n", a, x);
f(&x,y);
return 0; }
Yes, but this program has UB because it compares pointers into two different objects. It was exactly the ptrtoint/inttoptr that make the program (potentially) well defined. Thus, even though you can replicate the behavior without the inttoptr/ptrtoint, it is the presence of those casts that is essential to the potential behavior being a miscompile.
The compiler does not create out of bound access. 1) when A != B, the original program has out of bound access
Agreed -- if A != B then any transform is valid since the source has UB.
2) and A == B, the generated code is sementially equivalent to the original program. It is like all locals can be accessed via the same base %rsp.
I don't think this is true, if I understand you correctly. For instance, if you could access one alloca by offsetting from another (I presume that's what you meant by "all locals can be accessed via the same base %rsp"), mem2reg would not be legal. E.g. mem2reg would not be able to transform
define i32 @f() { %a = alloca i32 %b = alloca i32 store i32 0, i32 %a call void @unknown(i32 %b) %v = load i32, i32* %a ret i32 %v }
to
define i32 @f() { %b = alloca i32 call void @unknown(i32* %b) ret i32 0 }
(which it does today) since @unknown //may have// written something other than 0 to %a by offsetting from %b.
I think we should not suppress optimizations, but instead fix the real underlying problems.
The underlying object being accessed is not changed regardless of what base pointer is used -- whether AA can handle it currently is a different story.
For instance, suppose we want to access locality of local objects, we will need to introduce an optimization pass that lays out stack object in an optimal way. After this is done, all the local objects will now have a fixed offset from the same base. AA needs to be taught to handle such scenarios (e.g. annotate the accesses with unique local ids).
The compiler does not create out of bound access. 1) when A != B, the original program has out of bound access
Agreed -- if A != B then any transform is valid since the source has UB.
2) and A == B, the generated code is sementially equivalent to the original program. It is like all locals can be accessed via the same base %rsp.
I don't think this is true, if I understand you correctly. For instance, if you could access one alloca by offsetting from another (I presume that's what you meant by "all locals can be accessed via the same base %rsp"), mem2reg would not be legal. E.g. mem2reg would not be able to transform
define i32 @f() { %a = alloca i32 %b = alloca i32 store i32 0, i32 %a call void @unknown(i32 %b) %v = load i32, i32* %a ret i32 %v }
to
define i32 @f() { %b = alloca i32 call void @unknown(i32* %b) ret i32 0 }
(which it does today) since @unknown //may have// written something other than 0 to %a by offsetting from %b.
I think we should not suppress optimizations, but instead fix the real underlying problems.
That's pretty fundamental, unfortunately. If we want to do high level AA inferences (like pointers derived off of two distinct malloc()'s do not alias) then we have be okay with ptrtoint(a) == ptrtoint(b) != (a == b).
I'm still not seeing how this causes a problem. If I have two pointers, a and b, and they don't alias, but ptrtoint(a) == ptrtoint(b), then they must have non-overlapping lifetimes. To have non-overlapping lifetimes, there must be some mutually-aliasing code in between.
I was thinking of this situation:
A = malloc(); free(A); B = malloc(); B = 0 C = select (ptrtoint(A) == ptrtoint(B)), B, A ;; S0 r = C free(B);
If I apply the select rule you mention to S0 then we get:
A = malloc(); free(A); B = malloc(); B = 0 C = A r = C free(B);
Now since A and B are distinct mallocs, BasicAA (https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/BasicAliasAnalysis.cpp#L1579) will tell us that the store to B and load from C (=A) can be reordered. But in the original program, if ptrtoint(A) == ptrtoint(B) then r should have seen the store "*B = 0", so the reordering is not legal.
One way out of this is to make our memory model "flat" -- i.e. model the heap as a large array and make it well defined to offset from one allocation to another. But that may regress alias analysis (I'm not sure by how much, though).
Note that our AA is only defined for pointers that are accessed. This is exactly why we can't use AA to reason about pointer comparisons.
In fact, it may be the case that our AA queries are only well defined in general if some semantics-preserving transformation could put the two accesses next to each other. I've often thought this to be true, and perhaps this is why.
I think Hal nails the problem: both data-flow and control-flow may contribute to the value of the value of ptr2int. This is because integers can be freely swapped (e.g. by GVN) and so data-flow dependencies can be lost.
Therefore a pointer produced by int2ptr may alias with other objects rather than only the one found through data-flow dependencies. That's why int2ptr(ptr2int(x)) cannot always be replaced with x -- it certainly can in some cases.
BTW, regarding gcc, the only bug we were able to find was in AA with != comparisons only: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82177
I wanted to point out this issue has nothing to do with inttoptr. I can reproduce the problem with a program that does not use any of intptr/ptrint:
Fundamentally, what happens is that with address comparisons optimization pass(es) convert the program which is semantically equivalent but changes the underlying object of what a type points to. This greatly confuses AA (which assumes out of bound pointer does not point to neighboring stack objects or unknown pointer can not point to non-escaped locals etc) and leads to latter optimization pass to do bad things.
Capturing the alias relationship with predicate/assertion propagation is the way to solve it as pointed out by Danny, but there are just too many situations to handle to make it worthwhile. We can however special case the one-pass-the-end situation which is not uncommon IMO.
// program:
void f(int, int);
int main() { int a=0, x = 0, y[1]; // int a=0, y[1], x =0 ; int pi = &x; int yi = &y[1]; bool n = pi != yi;
if (n) { a = 100; pi = yi; }
if (n) { a = 100; pi = &y[0]; }
*pi = 15;
printf("a=%d x=%d\n", a, x);
f(&x,y);
return 0; }
Great, sounds like we are on the same page.
Unfortunately for LLVM, GCC has it easier in a lot of ways here. It is mainly based on a constraint-based points-to analysis, which means you can add any constraints you want (we can do the equivalent in our points-to, but we don't rely on a real points-to at all).
So for example, for this bug, they can just add constraints based on conditionals, or whatever else they want. This is true even though they are really uses and not defs.
LLVM's BasicAA only looks at the def chain when determining what has happened to a thing.
This is not enough if you are allowed to use conditionals to induce aliasing relationships.
Fixing the multi-file case is pretty much impossible without giving up, you'd have to assume that all escaped pointers may be equal after a function call, etc.
Given all this, i'll be honest, my temptation is to ignore this problem, or: give a very well defined way that people can have this work that we can mark in the frontend and emit, and give up optimizing those or something.
Yay for C/C++
I believe that we could define the current state of things to be correct by saying: Any programs with side effects dependent on the absolute or relative addresses of objects is ill defined.
Sure
However, while "most" code is probably fine with that, I suspect that you can't write an OS kernel, a sanitier runtime, or the like, under those restrictions.
Regardless of what we do, i would like to challenge this, because it implies the current state is very broken. IE i think we could define the current state to be fine (maybe not the way you have) and be okay, from what i've seen.
I agree. I meant to imply that we probably can't use something as general as I had suggested.
Apple ships a clang built kernel. Google has built and tested (heavily) production linux kernels with clang. Same with android and chromeos kernels (IE a very different situation). Obviously, the current sanitizer runtimes are also built with clang.
So far, not a single bug or crash has been tracked down or caused by this or related issues.
I feel like this is empirical proof that you can write an OS kernel, sanitizer runtime, or the like, that works okay.
Maybe it'll stop working in the future. But it has survived at least a year of llvm/clang development so far :)
GCC has not stopped propagated any conditional equivalences (IE of the form in 17, 18, etc), and is the default compiler for the kernel/sanitier runtime for everyone else. It has been propagating them like this for at least 10 years. It has had bugs, including this one (and similar related ones) for 10+ years.
That to me suggests the this world is not that bad.
To be clear, I didn't mean to imply that our current implementation, as far as this goes, is all that bad in practice. I mean that, if you define the UB broadly as I suggested above, you'd likely end up with problems with that kind of code.
It does leave open, however, how we'd actually define things to make this okay.
Even if you restrict the UB to objects on the stack, which I suppose you could do, it's still not clear to me how practical this will be.
I don't want to just ignore the problem, but either we define self-consistent semantics with which we're happy that makes this ill defined, or we fix it (either by default or, as suggested, under some pedantic flag).
Though to be clear, i'm fine with whatever.
Given all this, i'll be honest, my temptation is to ignore this problem, or: give a very well defined way that people can have this work that we can mark in the frontend and emit, and give up optimizing those or something.
Yay for C/C++
I believe that we could define the current state of things to be correct by saying: Any programs with side effects dependent on the absolute or relative addresses of objects is ill defined.
Sure
However, while "most" code is probably fine with that, I suspect that you can't write an OS kernel, a sanitier runtime, or the like, under those restrictions.
Regardless of what we do, i would like to challenge this, because it implies the current state is very broken. IE i think we could define the current state to be fine (maybe not the way you have) and be okay, from what i've seen.
Apple ships a clang built kernel. Google has built and tested (heavily) production linux kernels with clang. Same with android and chromeos kernels (IE a very different situation). Obviously, the current sanitizer runtimes are also built with clang.
So far, not a single bug or crash has been tracked down or caused by this or related issues.
I feel like this is empirical proof that you can write an OS kernel, sanitizer runtime, or the like, that works okay.
Maybe it'll stop working in the future. But it has survived at least a year of llvm/clang development so far :)
GCC has not stopped propagated any conditional equivalences (IE of the form in 17, 18, etc), and is the default compiler for the kernel/sanitier runtime for everyone else. It has been propagating them like this for at least 10 years. It has had bugs, including this one (and similar related ones) for 10+ years.
That to me suggests the this world is not that bad.
Even if you restrict the UB to objects on the stack, which I suppose you could do, it's still not clear to me how practical this will be.
I don't want to just ignore the problem, but either we define self-consistent semantics with which we're happy that makes this ill defined, or we fix it (either by default or, as suggested, under some pedantic flag).
Though to be clear, i'm fine with whatever.
I think Hal nails the problem: both data-flow and control-flow may contribute to the value of the value of ptr2int. This is because integers can be freely swapped (e.g. by GVN) and so data-flow dependencies can be lost.
Sure.
Therefore a pointer produced by int2ptr may alias with other objects rather than only the one found through data-flow dependencies. That's why int2ptr(ptr2int(x)) cannot always be replaced with x -- it certainly can in some cases.
BTW, regarding gcc, the only bug we were able to find was in AA with != comparisons only: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82177
As richard says in that bug (and i pointed out elsewhere), it will completely screw up pretty much any aliasing induced by conditional equivalences (which are the other bugs i cited).
Given all this, i'll be honest, my temptation is to ignore this problem, or: give a very well defined way that people can have this work that we can mark in the frontend and emit, and give up optimizing those or something.
Yay for C/C++
I believe that we could define the current state of things to be correct by saying: Any programs with side effects dependent on the absolute or relative addresses of objects is ill defined.
However, while "most" code is probably fine with that, I suspect that you can't write an OS kernel, a sanitier runtime, or the like, under those restrictions. Even if you restrict the UB to objects on the stack, which I suppose you could do, it's still not clear to me how practical this will be.
I don't want to just ignore the problem, but either we define self-consistent semantics with which we're happy that makes this ill defined, or we fix it (either by default or, as suggested, under some pedantic flag).
I think Hal nails the problem: both data-flow and control-flow may contribute to the value of the value of ptr2int. This is because integers can be freely swapped (e.g. by GVN) and so data-flow dependencies can be lost.
Therefore a pointer produced by int2ptr may alias with other objects rather than only the one found through data-flow dependencies. That's why int2ptr(ptr2int(x)) cannot always be replaced with x -- it certainly can in some cases.
BTW, regarding gcc, the only bug we were able to find was in AA with != comparisons only: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82177
Regarding "anyone expects a correct result here" is it a reasonable reading of the "Add a new paragraph" part of DR 260 that the compiler is sort of allowed to do the wrong thing?
So, DR260 came about because of similar cases in gcc land. You can find cases that don't directly implicate it but are similar things.
There is a whole slew of bugs (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54945 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502 and others that i forget).
In fact, i believe gcc has determined that they couldn't even propagate conditional equivalences for pointers or integers, if they wanted this to work in all cases. See, e.g.,https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 the later comments.
As richard points out: "You can expose the same issue by piecewise decomposing the pointer to chars, and having them equivalency propagated in a bogus way, then reconstruct the pointer from the chars. So it's not enough to disable pointer and uintptr_t propagations either."
You can also split the equivalency propagation across as many functions as you want.
So i believe i come to the same conclusion Richard does: If you want this stuff to work in all cases, we'd probably have to give up on conditional equivalences.
Given all this, i'll be honest, my temptation is to ignore this problem, or: give a very well defined way that people can have this work that we can mark in the frontend and emit, and give up optimizing those or something.
Yay for C/C++
Regarding "anyone expects a correct result here" is it a reasonable reading of the "Add a new paragraph" part of DR 260 that the compiler is sort of allowed to do the wrong thing?
It's probably not a good idea but I wanted to float the possibility of having the compiler look for cases where a one-past-the-end pointer might cause trouble and either add a bit of padding or else lay out the stack slots in a different order.
Okay, so, i admit to feeling confused why we've defined inttoptr and ptrtoint this way (IE to quantumly entangle other pointers, etc).
I feel (probably wrongly), like the pointer aliasing definition should be something like "inttoptr only aliases the memory that is at the address of value passed to it".
IE inttoptr(ptrtoint(x)) is a no-op from aliasing perspective. Whatever i can prove about the aliasing or non-alias of those objects is up to me.
To prove anything about the aliasing result, i would need to take the pointer x, try to perform math on it, and see what the result is.
Note also the result gcc 7 gives for the original example:
╰─ gcc-7 -O3 foo.c b.c
╰─ ./a.out
a=0 x=0
bam.
This has been like this as far back as i was lazy enough to try ;)
ICC 2018 gives a correct result of a=100 x=0 TL;DR Given both gcc and clang fail at this, and have for years, i'm not sure anyone expects a correct result here :)
That's pretty fundamental, unfortunately. If we want to do high level AA inferences (like pointers derived off of two distinct malloc()'s do not alias) then we have be okay with ptrtoint(a) == ptrtoint(b) != (a == b).
I'm still not seeing how this causes a problem. If I have two pointers, a and b, and they don't alias, but ptrtoint(a) == ptrtoint(b), then they must have non-overlapping lifetimes. To have non-overlapping lifetimes, there must be some mutually-aliasing code in between.
agreed.
Note that our AA is only defined for pointers that are accessed. This is exactly why we can't use AA to reason about pointer comparisons.
In fact, it may be the case that our AA queries are only well defined in general if some semantics-preserving transformation could put the two accesses next to each other. I've often thought this to be true, and perhaps this is why.
(we currently define the result in terms of the result of ptrtoint on each operand).
That's precisely the problem though -- given what you said, the expression is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be safely transformed to "q := b".
Indeed. Good point. We could say that this transformation is unsafe unless we know that both a and b have the same underlying object.
But if ptrtoint(a) == ptrtoint(b) does not imply that a == b, that also seems unfortunate.
That's pretty fundamental, unfortunately. If we want to do high level AA inferences (like pointers derived off of two distinct malloc()'s do not alias) then we have be okay with ptrtoint(a) == ptrtoint(b) != (a == b).
I think, however, that I'd still prefer to restrict the inttoptr(ptrtoint(x)) folding to this.
I'm not sure if you meant "restricting inttoptr(ptrtoint(x))==x is more preferable" or the converse (the "however" threw me off :) ), but in any case, I think we have to restrict both. We can't assume ptrtoint(a) == ptrtoint(b) implies a == b for the reason stated above.
And if we allow inttoptr(ptrtoint(x)) == x (but not "ptrtoint(a) == ptrtoint(b) implies a == b") then we have:
i64 A = ... i64 B = ... i64 C = A == B ? A : B;
and we replace C with B (since we're dealing with integers here so our restriction does not apply), and the whole program was actually:
// X and Y are 1 byte allocas where the address X+1 is the same as Y i64 A = ptrtoint(Y) i64 B = ptrtoint(X+1) i64 C = B // after the transformation i8 C_ptr = inttoptr(C) C_ptr = 1
then the inttoptr(ptrtoint(x)) == x rule gives us:
// X and Y are 1 byte allocas where the address X+1 is the same as Y i64 A = ptrtoint(Y) i64 B = ptrtoint(X+1) i64 C = B // after the transformation i8 C_ptr = inttoptr(C) (X+1) = 1
which is an out of bounds access.
The compiler does not create out of bound access. 1) when A != B, the original program has out of bound access 2) and A == B, the generated code is sementially equivalent to the original program. It is like all locals can be accessed via the same base %rsp.
I think we should not suppress optimizations, but instead fix the real underlying problems.
That's pretty fundamental, unfortunately. If we want to do high level AA inferences (like pointers derived off of two distinct malloc()'s do not alias) then we have be okay with ptrtoint(a) == ptrtoint(b) != (a == b).
I'm still not seeing how this causes a problem. If I have two pointers, a and b, and they don't alias, but ptrtoint(a) == ptrtoint(b), then they must have non-overlapping lifetimes. To have non-overlapping lifetimes, there must be some mutually-aliasing code in between.
Note that our AA is only defined for pointers that are accessed. This is exactly why we can't use AA to reason about pointer comparisons.
In fact, it may be the case that our AA queries are only well defined in general if some semantics-preserving transformation could put the two accesses next to each other. I've often thought this to be true, and perhaps this is why.
(we currently define the result in terms of the result of ptrtoint on each operand).
That's precisely the problem though -- given what you said, the expression is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be safely transformed to "q := b".
Indeed. Good point. We could say that this transformation is unsafe unless we know that both a and b have the same underlying object.
But if ptrtoint(a) == ptrtoint(b) does not imply that a == b, that also seems unfortunate.
That's pretty fundamental, unfortunately. If we want to do high level AA inferences (like pointers derived off of two distinct malloc()'s do not alias) then we have be okay with ptrtoint(a) == ptrtoint(b) != (a == b).
I think, however, that I'd still prefer to restrict the inttoptr(ptrtoint(x)) folding to this.
I'm not sure if you meant "restricting inttoptr(ptrtoint(x))==x is more preferable" or the converse (the "however" threw me off :) ), but in any case, I think we have to restrict both. We can't assume ptrtoint(a) == ptrtoint(b) implies a == b for the reason stated above.
And if we allow inttoptr(ptrtoint(x)) == x (but not "ptrtoint(a) == ptrtoint(b) implies a == b") then we have:
i64 A = ... i64 B = ... i64 C = A == B ? A : B;
and we replace C with B (since we're dealing with integers here so our restriction does not apply), and the whole program was actually:
// X and Y are 1 byte allocas where the address X+1 is the same as Y i64 A = ptrtoint(Y) i64 B = ptrtoint(X+1) i64 C = B // after the transformation i8 C_ptr = inttoptr(C) C_ptr = 1
then the inttoptr(ptrtoint(x)) == x rule gives us:
// X and Y are 1 byte allocas where the address X+1 is the same as Y i64 A = ptrtoint(Y) i64 B = ptrtoint(X+1) i64 C = B // after the transformation i8 C_ptr = inttoptr(C) (X+1) = 1
which is an out of bounds access.
Re #21. Suppressing the optimization for this cases looks like paper overing the real problem.
Regarding AA, I think in practice, the common case is that off-by-one address is used as sentinel value, so we only need to teach AA to handle this explicit out of bound case instead of making AA generally less conservative.
less conservative --> more conservative.
Besides, we can teach AA to look at real references. If there are, we can assume OOB does not exist -- otherwise the behavior is UB.
Re #21. Suppressing the optimization for this cases looks like paper overing the real problem.
Regarding AA, I think in practice, the common case is that off-by-one address is used as sentinel value, so we only need to teach AA to handle this explicit out of bound case instead of making AA generally less conservative. Besides, we can teach AA to look at real references. If there are, we can assume OOB does not exist -- otherwise the behavior is UB.
(we currently define the result in terms of the result of ptrtoint on each operand).
That's precisely the problem though -- given what you said, the expression is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be safely transformed to "q := b".
Indeed. Good point. We could say that this transformation is unsafe unless we know that both a and b have the same underlying object.
But if ptrtoint(a) == ptrtoint(b) does not imply that a == b, that also seems unfortunate.
I think, however, that I'd still prefer to restrict the inttoptr(ptrtoint(x)) folding to this.
I don't see the connection with ptrtoint/intptr.
Whether the conditional assignment is optimized into q := b seems irrelevant. Suppose we have a local pointer analysis that can see through this and will know pi's value is y+1 (after the first select).
I think the real problem is that the address y+1 is out of bounds of y, though there is no out of bounds access, the compiler can not assume that this address does not alias with other object, in this case 'x' (the second issue in CSE mentioned in comment #16)
Yes, we could define things this way. It would make our AA much less powerful (because we'd need to assume that any pointer we could not prove in bounds could potentially alias with all other pointers). We should find a different solution.
(we currently define the result in terms of the result of ptrtoint on each operand).
That's precisely the problem though -- given what you said, the expression is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be safely transformed to "q := b".
Indeed. Good point. We could say that this transformation is unsafe unless we know that both a and b have the same underlying object.
I don't see the connection with ptrtoint/intptr.
Whether the conditional assignment is optimized into q := b seems irrelevant. Suppose we have a local pointer analysis that can see through this and will know pi's value is y+1 (after the first select).
I think the real problem is that the address y+1 is out of bounds of y, though there is no out of bounds access, the compiler can not assume that this address does not alias with other object, in this case 'x' (the second issue in CSE mentioned in comment #16)
(we currently define the result in terms of the result of ptrtoint on each operand).
That's precisely the problem though -- given what you said, the expression is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be safely transformed to "q := b".
I can now reproduce the problem. An IR reproducible: ...
Looks like Early CSE wrongly 1) commonizes the address of x and y+1 and makes %add.ptr.x to be y + 1 2) With 1), pi does not look like alias with x anymore, so initial value of x of 0 gets propagated to printf.
Does it have anything to do with intptr/ptrint?
I suspect the answer is still yes because an answer of no is worse. So %add.ptr.x is defined as:
%add.ptr = getelementptr inbounds [1 x i32], [1 x i32] %y, i64 0, i64 1 %cmp = icmp ne i32 %x, %add.ptr %add.ptr.x = select i1 %cmp, i32 %add.ptr, i32 %x
And CSE is just applying the identity:
q := (a != b ? b : a) => q := b
and that certainly seems reasonable to me. FWIW, as currently defined, the icmp is currently well defined even for operands that point to different objects (we currently define the result in terms of the result of ptrtoint on each operand).
I can now reproduce the problem. An IR reproducible:
llc good.ll > good.s clang good.s f.o
./a.out 100, 0
opt -early-cse-memssa -S < good.ll | llc > bad.s clang bad.s f.o
./a.out 0,0
opt -early-cse-memssa -S < good.ll > bad.ll
The key diff between good.ll and bad.ll are
%pi.1.in = select i1 %cmp, i32 %arraydecay, i32 %add.ptr
And:
%call = call i32 (i8, ...) @printf(i8 getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i32 %a.1, i32 0)
Looks like Early CSE wrongly 1) commonizes the address of x and y+1 and makes %add.ptr.x to be y + 1 2) With 1), pi does not look like alias with x anymore, so initial value of x of 0 gets propagated to printf.
Does it have anything to do with intptr/ptrint?
// good.ll:
@.str = private unnamed_addr constant [11 x i8] c"a=%d x=%d\0A\00", align 1
define i32 @main() local_unnamed_addr #0 { entry: %x = alloca i32, align 4 %y = alloca [1 x i32], align 4 %0 = bitcast i32 %x to i8 call void @llvm.lifetime.start.p0i8(i64 4, i8 %0) #4 store i32 0, i32 %x, align 4, !tbaa !2 %1 = bitcast [1 x i32] %y to i8 call void @llvm.lifetime.start.p0i8(i64 4, i8 %1) #4 %arraydecay = getelementptr inbounds [1 x i32], [1 x i32] %y, i64 0, i64 0 %add.ptr = getelementptr inbounds [1 x i32], [1 x i32] %y, i64 0, i64 1 %cmp = icmp ne i32 %x, %add.ptr %add.ptr.x = select i1 %cmp, i32 %add.ptr, i32 %x %. = select i1 %cmp, i32 100, i32 0 %pi.1.in = select i1 %cmp, i32 %arraydecay, i32 %add.ptr.x %a.1 = select i1 %cmp, i32 100, i32 %. store i32 15, i32 %pi.1.in, align 4, !tbaa !2 %2 = load i32, i32 %x, align 4, !tbaa !2 %call = call i32 (i8, ...) @printf(i8 getelementptr inbounds ([11 x i8], [11 x i8] @.str, i64 0, i64 0), i32 %a.1, i32 %2) call void @Z1fPiS(i32 nonnull %x, i32 %arraydecay) call void @llvm.lifetime.end.p0i8(i64 4, i8 %1) #4 call void @llvm.lifetime.end.p0i8(i64 4, i8* %0) #4 ret i32 0 }
; Function Attrs: argmemonly nounwind declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
; Function Attrs: nounwind declare i32 @printf(i8* nocapture readonly, ...) local_unnamed_addr #2
declare void @Z1fPiS(i32, i32) local_unnamed_addr #3
; Function Attrs: argmemonly nounwind declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
attributes #0 = { norecurse uwtable } attributes #1 = { argmemonly nounwind } attributes #2 = { nounwind } attributes #3 = { nounwind } attributes #4 = { nounwind }
!llvm.module.flags = !{#0} !llvm.ident = !{#1}
!0 = !{i32 1, !"wchar_size", i32 4} !1 = !{!"clang version 6.0.0 (trunk 312663)"} !2 = !{#3, !3, i64 0} !3 = !{!"int", !4, i64 0} !4 = !{!"omnipotent char", !5, i64 0} !5 = !{!"Simple C++ TBAA"}
The problem really stems the following rule from http://llvm.org/docs/LangRef.html#pointer-aliasing-rules:
- A pointer value formed by an inttoptr is based on all pointer values that contribute (directly or indirectly) to the computation of the pointer’s value.
I think Nuno wrote up a good description somewhere, but essentially the problem is that we end up erasing dependencies, so alias analysis returns the wrong result.
I think that this is essentially correct. The problem here is that inttoptr(ptrtoint(x)) only appears be based on x. However, in this case, the inttoptr is also control dependent on a comparison between some other pointer values. The resulting pointer, therefore, is based on all of these pointers. AA won't see this, just based on inttoptr(ptrtoint(x)) == x, because it will only see x.
The problem really stems the following rule from http://llvm.org/docs/LangRef.html#pointer-aliasing-rules:
- A pointer value formed by an inttoptr is based on all pointer values that contribute (directly or indirectly) to the computation of the pointer’s value.
I think Nuno wrote up a good description somewhere, but essentially the problem is that we end up erasing dependencies, so alias analysis returns the wrong result.
Is this the problem:
inttoptr(ptrtoint(x)) + int_value
where the int_value is actually a pointer but x is not.
If the llvm folds the first, this becomes a getelementptr with x as the base, then the result value won't aliased with what int_value points to anymore?
The problem really stems the following rule from http://llvm.org/docs/LangRef.html#pointer-aliasing-rules:
I think Nuno wrote up a good description somewhere, but essentially the problem is that we end up erasing dependencies, so alias analysis returns the wrong result.
Can someone attach a ll file and the opt commandline to reproduce the problem consistently?
I was able to repro with John's reordering of the variables, but that doesn't seem that important.
Why can't we assume inttoptr(ptrtoint(x)) is a no-op? What's wrong with that in principle? "a=0 x=0" seems like a real bug, but I'm not convinced this is the cause.
Ok, so I guess some of you cannot reproduce this because your clang has a default option to enable some sanitization. Try with clang -cc1 (adjust paths as needed):
$ ./bin/clang -cc1 -triple x86_64-pc-linux-gnu -emit-obj -O2 -o c.o c.c -internal-externc-isystem /usr/include/ -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-isystem /home/nuno/llvm-build/lib/clang/6.0.0/include
$ ./bin/clang -o foo b.c c.o
$ ./foo
Does the soundness issue go away if we stop optimizing 'n' to 1?
I don't think we do that? At least, I can't reproduce that with trunk clang... and it's pretty clearly not a legal transform. See https://reviews.llvm.org/rL249490 .
Extended Description
Example of an end-to-end miscompilation by clang of the following code involving ptrtoint:
$ cat c.c
include
void f(int, int);
int main() { int a=0, y[1], x = 0; uintptr_t pi = (uintptr_t) &x; uintptr_t yi = (uintptr_t) (y+1); uintptr_t n = pi != yi;
if (n) { a = 100; pi = yi; }
if (n) { a = 100; pi = (uintptr_t) y; }
(int )pi = 15;
printf("a=%d x=%d\n", a, x);
f(&x,y);
return 0; }
$ cat b.c void f(intx, inty) {}
$ clang -O2 c.c b.c -o foo
$ ./foo a=0 x=0
This result is wrong. The two possible outcomes are: a=0 x=15, and a=100 x=0.
The bug is in Instcombine that treats inttoptr(ptrtoint(x)) == x, which is incorrect. This transformation can only be done if x is dereferenceable for the accesses through inttoptr. Compare the following: clang -O0 -S -emit-llvm -o - c.c | opt -S -sroa clang -O0 -S -emit-llvm -o - c.c | opt -S -sroa -instcombine
Integer compares are replaces with pointer compares (wrong) and load/stores are changed from inttoptr to pointers directly (also wrong).
Test case by Gil Hur.