llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28k stars 11.56k forks source link

Wrong optimizations for pointers: `p == q ? p : q` -> `q` #43719

Open llvmbot opened 4 years ago

llvmbot commented 4 years ago
Bugzilla Link 44374
Version trunk
OS Linux
Reporter LLVM Bugzilla Contributor
CC @kamleshbhalui,@rnk

Extended Description

Similar to bug 44313.

The optimizer sometimes changes p == q ? p : q to q. This is wrong when the actual provenance of p differs from that of q. There are two forms -- with the actual conditional operator and with the if statement.

The ideal example would be constructed with the help of restricted pointers but it's run into a theoretical problem -- see the first testcase in bug 44373. My other examples require two conditionals to eliminate the possibility of UB. Comparison of integers should give stable results, hopefully that would be enough to demonstrate the problem.

gcc bug -- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93052.

Example with the conditional operator and with dead malloc (the wrong optimization seems to be applied in Early CSE):


include

include

include

attribute((noipa,optnone)) // imagine it in a separate TU static void opaque(void p) { return p; }

int main() { int q = malloc(sizeof(int)); opaque(q); uintptr_t iq = (uintptr_t)(void )q; free(q);

int *p = malloc(sizeof(int));
opaque(p);
uintptr_t ip = (uintptr_t)(void *)p;

uintptr_t ir = ip == iq ? ip : iq;
if (ip == iq) {
    *p = 1;
    *(int *)(void *)ir = 2;
    printf("result: %d\n", *p);
}

}

$ clang -std=c11 -Weverything -Wno-unknown-attributes test.c && ./a.out result: 2 $ clang -std=c11 -Weverything -Wno-unknown-attributes -O3 test.c && ./a.out result: 1

clang x86-64 version: clang version 10.0.0 (https://github.com/llvm/llvm-project.git fccac1ec16951e9a9811abf19e2c18be147854fc)

The idea of problems arising from p == q ? p : q is from Chung-Kil Hur via https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752#c15.

llvmbot commented 4 years ago

Mostly we discussed the issue verbally, but I think Piotr covered the problem it in his CppCon talk about the devirtualization feature: https://www.youtube.com/watch?v=lQAxldEGOys

Thanks, that provided the necessary context for bug 34548, comment 49. After looking a bit deeper into it I filed bug 44472.

I think the idea of pointer provenance is very similar to the idea of trying to assign dynamic C++ types to blobs of memory.

Yeah, in some sense. E.g., when you optimize an isolated function, both are unknown but propagate through assignments and such (until control is passed to another function).

rnk commented 4 years ago

This type of issue has long been known,

Do you have any links to corresponding discussions, bug reports etc.?

Mostly we discussed the issue verbally, but I think Piotr covered the problem it in his CppCon talk about the devirtualization feature: https://www.youtube.com/watch?v=lQAxldEGOys

I think the idea of pointer provenance is very similar to the idea of trying to assign dynamic C++ types to blobs of memory.

llvmbot commented 4 years ago

This type of issue has long been known,

Do you have any links to corresponding discussions, bug reports etc.?

and is not likely to get traction in the LLVM issue tracker.

I don't expect too much from these bug reports but they cover two specific issues and provide simple and diverse testcases for them (especially bug 44313). Hopefully they could help inform future discussions of bigger issues.

rnk commented 4 years ago

This type of issue has long been known, and is not likely to get traction in the LLVM issue tracker. The first time I remember meaningfully discussing this type of issue was around Piotr Padlewski's internship in 2014 or 2015.

In order to resolve this and the larger issue class, a real discussion on llvm-dev is needed. Historically, LLVM has done these value replacement optimizations, but it has not done pointer provenance optimizations, so fixing this will be a change in direction.

llvmbot commented 4 years ago

Example with a past-the-end pointer (Early CSE w/ MemorySSA):


include

attribute((noipa,optnone)) // imagine it in a separate TU static void opaque(void p) { return p; }

static int been_there = 0;

static int f(int p, int *q) { if (p == q) { been_there = 1; return p; } else { been_there = 0; return q; } }

int main() { int x[5]; int y[1];

int *p = x;
int *q = y + 1;
opaque(q);

int *p1 = opaque(p); // prevents early optimization of x==y+1
int *r = f(p1, q);

if (been_there) {
    *p = 1;
    *r = 2;
    printf("result: %d\n", *p);
}

}

$ clang -std=c11 -Weverything -Wno-unknown-attributes test.c && ./a.out result: 2 $ clang -std=c11 -Weverything -Wno-unknown-attributes -O3 test.c && ./a.out result: 1

clang x86-64 version: clang version 10.0.0 (https://github.com/llvm/llvm-project.git fccac1ec16951e9a9811abf19e2c18be147854fc)

The testcase in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93052#c2 seems to be optimized in the same way by clang, so not pasting it here.