llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.53k stars 11.79k forks source link

Interaction between C's restrict pointer and pointer casted from integer is ambiguous in LLVM's LangRef #39193

Open aqjune opened 5 years ago

aqjune commented 5 years ago
Bugzilla Link 39846
Version trunk
OS Linux
Depends On llvm/llvm-project#33896
Blocks llvm/llvm-project#39717
Attachments The example code
CC @hfinkel,@dobbelaj-snps,@aqjune,@LebedevRI,@zhengyang92,@nikic,@nunoplopes,@RalfJung,@regehr,@sanjoy

Extended Description

Consider this example:

#include <stdint.h>

int foo(int *restrict x, int *restrict y) {
  *y = 23;
  uintptr_t xi = (uintptr_t)x;
  uintptr_t yi = (uintptr_t)y;
  if (xi != yi) { // xi == yi, so this is never taken
    yi = xi;
  }
  *(int*)yi = 42; // changes *y or UB?
  return *y;
}

int bar() {
  int x;
  return foo(&x, &x); // returns 23
}

After -O3, foo(&x, &x) is optimized into ret i32 23 ( https://godbolt.org/z/ty73Xk ).

Here are several ways to explain this result:

(1) foo(&x, &x) is not allowed in LLVM IR - but I believe this makes the semantics more undefined than C (because C allows multiple restrict pointers to point have the same address if they are only read, but never written)

(2) *(int*)yi = 42 is not allowed - but LLVM IR LangRef's pointer aliasing rules ( https://llvm.org/docs/LangRef.html#pointer-aliasing-rules ) say that (int *)yi is based on y, so I think this behavior is well-defined. Perhaps we should update the definition of based-on relation, if we follow this option.

(3) Restrict pointer cannot be casted into integer - if we choose this option, LangRef should be updated to deal with this.

If we're to choose (2) and (3), we should update LLVM LangRef.

If this is considered to be miscompilation, I think this is related to llvm/llvm-project#33896 .

aqjune commented 2 years ago

mentioned in issue llvm/llvm-project#39717

LebedevRI commented 5 years ago

Sorry for late update. Here are patches

Note that these patches don't have appropriate mailing lists subscribed, so the general public will not really find them.

aqjune commented 5 years ago

Sorry for late update. Here are patches for psub as well as

https://reviews.llvm.org/D56598 shows the definition of the suggested psub operation. The number of ptrtoint/inttoptr instructions is updated as well. https://reviews.llvm.org/D56601 shows how CaptureTracker can be updated. https://reviews.llvm.org/D56605 shows the initial version of isSafeToPropagatePtrEquality. As discussed in this thread, this patch should be expanded to support more general cases, e.g., by introducing new intrinsics or so.

hfinkel commented 5 years ago

IMHO, this is a long-standing bug and we should fix it properly. I recall that we decided that (B) was the problematic case, as it fails to capture the fact that other pointer value might now also have contributed to the result as control dependencies.

Your understanding is right, but pointer equality without using inttoptr & ptrtoint still can cause a problem. llvm/llvm-bugzilla-archive#35229 is related to this (you can see that there's no int -> ptr cast in the code). This bug also can be fixed by disabling pointer equality propagation.

Yes, but I don't really want to disable the transformation if we don't have to. While we're looking at intrinsics, can't we use one for this problem as well? What if we add some intrinsic like @​llvm.ptr.also_derived_from(base, a, b, c, ...)? It would likely need to be opaque, and would interfere with optimization in other ways, but it might be worth the tradeoffs.

Or maybe we can first disable inttoptr(ptrtoint p) -> p (B), neutralize the effect of the disabling, and then working on (C). While in that period we can somehow come up with more cases for enabling pointer equality propagation, e.g. replacing pointer uses for certain operations, trying union operation which is suggested in the bug 34548.

Disabling both (B) and (C) can cause big performance degradation, but I observed that adding valid transformations (e.g. conditionally allowing pointer equality) helped SPEC CPU 2017 & LLVM Nightly Tests regain performance degradation. We have performance numbers & explanation for this - https://sf.snu.ac.kr/publications/llvmtwin.pdf . Text is long, but important parts are Sec. 5, 6. (this text mentions GVN and InstSimplify only, but CVP is disabled in the experiment as well, just missing in the text)

Thanks a bunch for looking into the effects of disabling this. Regarding the "In case of inttoptr, ~95% of them are generated from InstCombine optimization load **p -> load i64* . (https://godbolt.org/z/uPNook ) " -

(I just updated the test a bit, so inttoptr appears: https://godbolt.org/z/_Jqmsx )

We shouldn't have optimizations generating inttoptr. We can case to i8* and then do arithmetic, correct? If so, we should definitely update InstCombine to work in this way.

It will help reduce # of inttoptrs. :) But one case I'm considering is something like this:

struct { char** x; int64_t y; } t1 = .., t2; t2.x = t1.x; t2.y = t1.y;

SLPVectorizer vectorizes two loads/stores as <2 x i64>, and this boosts performance for certain benchmarks in LLVM Nightly Tests. It would be great if it can be vectorized as <2 x i8>, but we need to convert `load i64intoload i8*again, which may introduceptrtoint` again this time. How do you think about enhancing SLPVectorizer to understand both i64/ty** and treating them equally?

Yes, it makes sense for it to understand this. Especially if the alternative is inttoptr/ptrtoint.

Regarding ptrtoint and the new instruction, any reason not to start with an intrinsic?

I agree that starting as an intrinsic is a good idea because adding properties (e.g. readnone, etc) to it is straightforward. I'll upload a patch to Phabricator which shows definition of psub intrinsic.

Great.

aqjune commented 5 years ago

I have a prototype of function isSafeToPropagatePtrEquality(p1, p2) which checks whether replacing p1 with p2 is safe, given p1 == p2.

https://github.com/snu-sf/llvm-twin/commit/ 3946989831f734c8bc2e83a59948ec8b6f084247 .

I believe we can make optimizations like GVN/InstSimplify/CVP call this to check the safety.

Not all of the cases in this are immediately obvious, but feel free to post for review.

Sure, I'll upload Phabricator links (including psub) here after posting them.

aqjune commented 5 years ago

IMHO, this is a long-standing bug and we should fix it properly. I recall that we decided that (B) was the problematic case, as it fails to capture the fact that other pointer value might now also have contributed to the result as control dependencies.

Your understanding is right, but pointer equality without using inttoptr & ptrtoint still can cause a problem. llvm/llvm-bugzilla-archive#35229 is related to this (you can see that there's no int -> ptr cast in the code). This bug also can be fixed by disabling pointer equality propagation.

Or maybe we can first disable inttoptr(ptrtoint p) -> p (B), neutralize the effect of the disabling, and then working on (C). While in that period we can somehow come up with more cases for enabling pointer equality propagation, e.g. replacing pointer uses for certain operations, trying union operation which is suggested in the bug 34548.

Disabling both (B) and (C) can cause big performance degradation, but I observed that adding valid transformations (e.g. conditionally allowing pointer equality) helped SPEC CPU 2017 & LLVM Nightly Tests regain performance degradation. We have performance numbers & explanation for this - https://sf.snu.ac.kr/publications/llvmtwin.pdf . Text is long, but important parts are Sec. 5, 6. (this text mentions GVN and InstSimplify only, but CVP is disabled in the experiment as well, just missing in the text)

Thanks a bunch for looking into the effects of disabling this. Regarding the "In case of inttoptr, ~95% of them are generated from InstCombine optimization load **p -> load i64* . (https://godbolt.org/z/uPNook ) " -

(I just updated the test a bit, so inttoptr appears: https://godbolt.org/z/_Jqmsx )

We shouldn't have optimizations generating inttoptr. We can case to i8* and then do arithmetic, correct? If so, we should definitely update InstCombine to work in this way.

It will help reduce # of inttoptrs. :) But one case I'm considering is something like this:

struct { char** x; int64_t y; } t1 = .., t2;
t2.x = t1.x;
t2.y = t1.y;

SLPVectorizer vectorizes two loads/stores as <2 x i64>, and this boosts performance for certain benchmarks in LLVM Nightly Tests. It would be great if it can be vectorized as <2 x i8*>, but we need to convert load i64* into load i8** again, which may introduce ptrtoint again this time. How do you think about enhancing SLPVectorizer to understand both i64*/ty** and treating them equally?

Regarding ptrtoint and the new instruction, any reason not to start with an intrinsic?

I agree that starting as an intrinsic is a good idea because adding properties (e.g. readnone, etc) to it is straightforward. I'll upload a patch to Phabricator which shows definition of psub intrinsic.

hfinkel commented 5 years ago

I have a prototype of function isSafeToPropagatePtrEquality(p1, p2) which checks whether replacing p1 with p2 is safe, given p1 == p2.

https://github.com/snu-sf/llvm-twin/commit/ 3946989831f734c8bc2e83a59948ec8b6f084247 .

I believe we can make optimizations like GVN/InstSimplify/CVP call this to check the safety.

Not all of the cases in this are immediately obvious, but feel free to post for review.

hfinkel commented 5 years ago

Hello Hal,

Yes, this bug and 34548 share the same underlying problem.

Here are the steps how the program is miscompiled:

  1. InstCombine

    if ((uintptr_t)x != (uintptr_t)y) { pass } i = phi [(uintptr_t)x, (uintptr_t)y] p = inttoptr i => // (A) if (x != y) { pass } i = phi [(uintptr_t)x, (uintptr_t)y] p = inttoptr i => // (B) if (x != y) { pass } p = phi[x, y]

  2. SimplifyCFG

    p = (x == y) ? y : x

  3. InstructionSimplify // (C)

    p = x

(A) converts integer comparison into pointer comparison. (B) folds inttoptr(ptrtoint p) -> p. (C) converts (x == y) ? y : x into x (where x and y are pointers).

I believe all of (A), (B), (C) can cause different kind of miscompilation bug, but for this bug & 34548 only, I checked that removing one of (B) or (C) successfully fixes this bug (with Clang 8.0, commit a12acc). So removing one of them will be a good start, and I suggest (C) because it seems easier to adopt. (but (B) should be blocked in the future as well, because we want to allow (x == y) ? y : x for integers)

One important concern is whether disabling these optimizations makes LLVM generate slower code, and my observation was like this:

(1) Disabling pointer equality propagation (C): We can conditionally allow pointer equality propagation again without introducing miscompilation bug. There are a few possible cases, but most important case is if (p == inttoptr(i)) { use(p) -> use(inttoptr(i)) }. As NULL is equivalent to inttoptr(0), replacing p with NULL (when p == NULL is given) is also valid. About 40% of pointer equality propagations are replacing with NULL, when compiling SPEC CPU 2017 & LLVM Nightly Tests.

(2) Optimizations that remove ptrtoint / inttoptr casts (A), (B): What I've observed is that, majority of these casts are actually generated from compiler, rather than written by programmers.

In case of inttoptr, ~95% of them are generated from InstCombine optimization load **p -> load i64* . (https://godbolt.org/z/uPNook ) This InstCombine optimization helps vectorization (because it makes types of loads homogeneous of ex, struct { int64_t x; char* y; } ), but I believe this transformation can be done only when the loads are going to be vectorized.

In case of ptrtoint, ~75% of them are generated from pointer subtractions (sub (ptrtoint p1) (ptrtoint p2)). One possible solution to this is to have a dedicated instruction (or intrinsics) for pointer subtraction, say 'psub'. This has additional benefit for optimization: we can define 'psub p1 p2' as poison if p1 and p2 are from different objects (alloca/malloc), then PointerMayBeCaptured(p1) can say 'no' if it is only used by psub, another argument of which pointing to an object. We found that this boosts performance for certain SPEC 2017 programs (~2%), but this solution requires introducing a new instruction, so may be a bit aggressive.

Finally, we can conditionally allow inttoptr(ptrtoint ptr) -> ptr again. If ptr is dereferenceable, we can say that inttoptr(ptrtoint ptr) is also dereferenceable, and the casts can be removed.

IMHO, this is a long-standing bug and we should fix it properly. I recall that we decided that (B) was the problematic case, as it fails to capture the fact that other pointer value might now also have contributed to the result as control dependencies.

Thanks a bunch for looking into the effects of disabling this. Regarding the "In case of inttoptr, ~95% of them are generated from InstCombine optimization load **p -> load i64* . (https://godbolt.org/z/uPNook ) " - We shouldn't have optimizations generating inttoptr. We can case to i8* and then do arithmetic, correct? If so, we should definitely update InstCombine to work in this way.

Regarding ptrtoint and the new instruction, any reason not to start with an intrinsic?

aqjune commented 5 years ago

I have a prototype of function isSafeToPropagatePtrEquality(p1, p2) which checks whether replacing p1 with p2 is safe, given p1 == p2.

https://github.com/snu-sf/llvm-twin/commit/3946989831f734c8bc2e83a59948ec8b6f084247 .

I believe we can make optimizations like GVN/InstSimplify/CVP call this to check the safety.

aqjune commented 5 years ago

Hello Hal,

Yes, this bug and 34548 share the same underlying problem.

Here are the steps how the program is miscompiled:

  1. InstCombine
    if ((uintptr_t)x != (uintptr_t)y) {
    pass
    }
    i = phi [(uintptr_t)x, (uintptr_t)y]
    p = inttoptr i

    => // (A)

    if (x != y) {
    pass
    }
    i = phi [(uintptr_t)x, (uintptr_t)y]
    p = inttoptr i

    => // (B)

    if (x != y) {
    pass
    }
    p = phi[x, y]
  2. SimplifyCFG
    p = (x == y) ? y : x
  3. InstructionSimplify // (C)
    p = x

(A) converts integer comparison into pointer comparison. (B) folds inttoptr(ptrtoint p) -> p. (C) converts (x == y) ? y : x into x (where x and y are pointers).

I believe all of (A), (B), (C) can cause different kind of miscompilation bug, but for this bug & 34548 only, I checked that removing one of (B) or (C) successfully fixes this bug (with Clang 8.0, commit a12acc). So removing one of them will be a good start, and I suggest (C) because it seems easier to adopt. (but (B) should be blocked in the future as well, because we want to allow (x == y) ? y : x for integers)

One important concern is whether disabling these optimizations makes LLVM generate slower code, and my observation was like this:

(1) Disabling pointer equality propagation (C): We can conditionally allow pointer equality propagation again without introducing miscompilation bug. There are a few possible cases, but most important case is if (p == inttoptr(i)) { use(p) -> use(inttoptr(i)) }. As NULL is equivalent to inttoptr(0), replacing p with NULL (when p == NULL is given) is also valid. About 40% of pointer equality propagations are replacing with NULL, when compiling SPEC CPU 2017 & LLVM Nightly Tests.

(2) Optimizations that remove ptrtoint / inttoptr casts (A), (B): What I've observed is that, majority of these casts are actually generated from compiler, rather than written by programmers.

In case of inttoptr, ~95% of them are generated from InstCombine optimization load **p -> load i64* . (https://godbolt.org/z/uPNook ) This InstCombine optimization helps vectorization (because it makes types of loads homogeneous of ex, struct { int64_t x; char* y; } ), but I believe this transformation can be done only when the loads are going to be vectorized.

In case of ptrtoint, ~75% of them are generated from pointer subtractions (sub (ptrtoint p1) (ptrtoint p2)). One possible solution to this is to have a dedicated instruction (or intrinsics) for pointer subtraction, say 'psub'. This has additional benefit for optimization: we can define psub p1 p2 as poison if p1 and p2 are from different objects (alloca/malloc), then PointerMayBeCaptured(p1) can say 'no' if it is only used by psub, another argument of which pointing to an object. We found that this boosts performance for certain SPEC 2017 programs (~2%), but this solution requires introducing a new instruction, so may be a bit aggressive.

Finally, we can conditionally allow inttoptr(ptrtoint ptr) -> ptr again. If ptr is dereferenceable, we can say that inttoptr(ptrtoint ptr) is also dereferenceable, and the casts can be removed.

hfinkel commented 5 years ago

If we simply wish to document what we currently implement now, then I think that (2) is closest. We might, however, just want to fix the bug. Is this the same underlying problem as in llvm/llvm-project#33896 (i.e., in this case, the optimizer is using the equality predicate to make the 42 assignment go through a value based on 'x' and not based on 'y')?

llvmbot commented 2 months ago

@llvm/issue-subscribers-c

Author: Juneyoung Lee (aqjune)

| | | | --- | --- | | Bugzilla Link | [39846](https://llvm.org/bz39846) | | Version | trunk | | OS | Linux | | Depends On | llvm/llvm-project#33896 | | Blocks | llvm/llvm-project#39717 | | Attachments | [The example code](https://user-images.githubusercontent.com/6484889/143758402-d218d95f-fb50-4583-958e-ebaac6e8a262.gz) | | CC | @hfinkel,@dobbelaj-snps,@aqjune,@LebedevRI,@zhengyang92,@nikic,@nunoplopes,@RalfJung,@regehr,@sanjoy | ## Extended Description Consider this example: ```cpp #include <stdint.h> int foo(int *restrict x, int *restrict y) { *y = 23; uintptr_t xi = (uintptr_t)x; uintptr_t yi = (uintptr_t)y; if (xi != yi) { // xi == yi, so this is never taken yi = xi; } *(int*)yi = 42; // changes *y or UB? return *y; } int bar() { int x; return foo(&x, &x); // returns 23 } ``` After `-O3`, `foo(&x, &x)` is optimized into `ret i32 23` ( https://godbolt.org/z/ty73Xk ). Here are several ways to explain this result: (1) `foo(&x, &x)` is not allowed in LLVM IR - but I believe this makes the semantics more undefined than C (because C allows multiple restrict pointers to point have the same address if they are only read, but never written) (2) `*(int*)yi = 42` is not allowed - but LLVM IR LangRef's pointer aliasing rules ( https://llvm.org/docs/LangRef.html#pointer-aliasing-rules ) say that `(int *)yi` is based on y, so I think this behavior is well-defined. Perhaps we should update the definition of based-on relation, if we follow this option. (3) Restrict pointer cannot be casted into integer - if we choose this option, LangRef should be updated to deal with this. If we're to choose (2) and (3), we should update LLVM LangRef. If this is considered to be miscompilation, I think this is related to llvm/llvm-project#33896 .
RalfJung commented 2 months ago

It would be great if it can be vectorized as <2 x i8>, but we need to convert load i64 into load i8** again, which may introduce ptrtoint again this time.

Really what should be used here is a type that can hold arbitrary data, whether it be pointers or integers. The "bytes type" would fill that need, or iN could be made to fill that need. In both cases there could then be a bitcast from that universal type to both iN and ptr, which has no special effects unlike ptrtoint/inttoptr.

Yes, but I don't really want to disable the transformation if we don't have to.

It does have to be disabled though. It's generally fundamentally wrong to replace a by b if they are both pointers and a == b, since the things they are "derived from" might be different. (There are special cases where the replacement is fine, but it general it is not.) That's what https://github.com/llvm/llvm-project/issues/34577 is about.

A systematic way to think about this is to consider the provenance of a pointer to be a genuine part of its value, i.e. variables of type ptr carry both an address of type i64 and a provenance. When stored in memory, the bytes that hold the ptr also carry extra metadata to hold the provenance. (This metadata should be considered as "actually existing" for the purpose of executing programs on the LLVM IR Abstract Machine. It can't just be dropped mid-way through the program's execution. Memory accesses are UB unless the pointer carries a provenance that grants access to the affected memory range. And there can't be a "wildcard" provenance that allows access to anything -- the entire reason provenance helps optimizations is that having the right provenance is necessarily required for every memory access. There can be situations where the compiler cannot statically know the provenance, but the runtime semantics of the LLVM IR Abstract Machine tracks and enforces provenance throughout. Of course, provenance is erased when lowing to assembly, but that goes together with the fact that provenance-based optimizations can no longer be performed at that point.)

Now it becomes clear where the problems stem from: