Open llvmbot opened 4 years ago
Mostly we discussed the issue verbally, but I think Piotr covered the problem it in his CppCon talk about the devirtualization feature: https://www.youtube.com/watch?v=lQAxldEGOys
Thanks, that provided the necessary context for bug 34548, comment 49. After looking a bit deeper into it I filed bug 44472.
I think the idea of pointer provenance is very similar to the idea of trying to assign dynamic C++ types to blobs of memory.
Yeah, in some sense. E.g., when you optimize an isolated function, both are unknown but propagate through assignments and such (until control is passed to another function).
This type of issue has long been known,
Do you have any links to corresponding discussions, bug reports etc.?
Mostly we discussed the issue verbally, but I think Piotr covered the problem it in his CppCon talk about the devirtualization feature: https://www.youtube.com/watch?v=lQAxldEGOys
I think the idea of pointer provenance is very similar to the idea of trying to assign dynamic C++ types to blobs of memory.
This type of issue has long been known,
Do you have any links to corresponding discussions, bug reports etc.?
and is not likely to get traction in the LLVM issue tracker.
I don't expect too much from these bug reports but they cover two specific issues and provide simple and diverse testcases for them (especially bug 44313). Hopefully they could help inform future discussions of bigger issues.
This type of issue has long been known, and is not likely to get traction in the LLVM issue tracker. The first time I remember meaningfully discussing this type of issue was around Piotr Padlewski's internship in 2014 or 2015.
In order to resolve this and the larger issue class, a real discussion on llvm-dev is needed. Historically, LLVM has done these value replacement optimizations, but it has not done pointer provenance optimizations, so fixing this will be a change in direction.
Example with a past-the-end pointer (Early CSE w/ MemorySSA):
attribute((noipa,optnone)) // imagine it in a separate TU static void opaque(void p) { return p; }
static int been_there = 0;
static int f(int p, int *q) { if (p == q) { been_there = 1; return p; } else { been_there = 0; return q; } }
int main() { int x[5]; int y[1];
int *p = x;
int *q = y + 1;
opaque(q);
int *p1 = opaque(p); // prevents early optimization of x==y+1
int *r = f(p1, q);
if (been_there) {
*p = 1;
*r = 2;
printf("result: %d\n", *p);
}
The testcase in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93052#c2 seems to be optimized in the same way by clang, so not pasting it here.
Extended Description
Similar to bug 44313.
The optimizer sometimes changes
p == q ? p : q
toq
. This is wrong when the actual provenance ofp
differs from that ofq
. There are two forms -- with the actual conditional operator and with theif
statement.The ideal example would be constructed with the help of restricted pointers but it's run into a theoretical problem -- see the first testcase in bug 44373. My other examples require two conditionals to eliminate the possibility of UB. Comparison of integers should give stable results, hopefully that would be enough to demonstrate the problem.
gcc bug -- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93052.
Example with the conditional operator and with dead malloc (the wrong optimization seems to be applied in Early CSE):
include
include
include
attribute((noipa,optnone)) // imagine it in a separate TU static void opaque(void p) { return p; }
int main() { int q = malloc(sizeof(int)); opaque(q); uintptr_t iq = (uintptr_t)(void )q; free(q);
}
$ clang -std=c11 -Weverything -Wno-unknown-attributes test.c && ./a.out result: 2 $ clang -std=c11 -Weverything -Wno-unknown-attributes -O3 test.c && ./a.out result: 1
clang x86-64 version: clang version 10.0.0 (https://github.com/llvm/llvm-project.git fccac1ec16951e9a9811abf19e2c18be147854fc)
The idea of problems arising from
p == q ? p : q
is from Chung-Kil Hur via https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752#c15.