A miscompilation bug with unsigned char

aqjune commented 6 years ago


Bugzilla Link	37469
Version	trunk
OS	Linux
Attachments	A source file that raises the bug.
CC	@chandlerc,@majnemer,@dwblaikie,@efriedma-quic,@hfinkel,@dobbelaj-snps,@MattPD,@mizvekov,@nunoplopes,@RalfJung,@regehr,@zygoloid,@sanjoy,@rotateright,@yuanfang-chen

Extended Description

$ cat test-main.c
#include <string.h>
#include <stdio.h>
#include <stdint.h>

// If f(A, B + 4) is given, and integer representation of A and B + 4
// are the same, c1 == c2 in the loop becomes true,
// so arr3 = arr1. Hence r = A, and *A should be 10.
// However, if compiled with -O3, *A is still 1.
void store_10_to_p(int *p, int *q) {
  unsigned char arr1[8];
  unsigned char arr2[8];
  unsigned char arr3[8];
  // Type punning with char* is allowed.
  memcpy((unsigned char*)arr1, (unsigned char *)&p, sizeof(p));
  memcpy((unsigned char*)arr2, (unsigned char *)&q, sizeof(q));
  // Now arr1 is p, arr2 is q.

  for (int i = 0; i < sizeof(q); i++) {
    int c1 = (int)arr1[i], c2 = (int)arr2[i];
    // Note that c1 == c2 is a comparison between _integers_ (not pointers).
    if (c1 == c2)
      // Always true if p and q have same integer representation.
      arr3[i] = arr1[i];
    else
      arr3[i] = arr2[i];
  }
  // Now arr3 is equivalent to arr1, which is p.
  int *r;
  memcpy(&r, (unsigned char *)arr3, sizeof(r));
  // Now r is p.
  *p = 1;
  *r = 10;
}

int main() {
  int A[4], B[4];
  printf("%p %p\n", A, &B[4]);
  if ((uintptr_t)A == (uintptr_t)&B[4]) {
    store_10_to_p(A, &B[4]);
    printf("%d\n", A[0]);
  }
  return 0;
}

$ clang -O3 -o test-main test-main.c
$ ./test-main
0x7fffffffe580 0x7fffffffe580
1
$ clang -O0 -o test-main test-main.c
$ ./test-main
0x7fffffffe580 0x7fffffffe580
10

This is what is happening inside LLVM: (1) Instcombine changes the loop body to "arr3[i] = arr2[i];" (2) Loop idiom recognizer replaces the loop with a "memcpy(arr3, arr2, 8)" (3) Instcombine does the store forwarding from the memcpy to the load

I think this is related with lowering 'unsigned char' in C into 'i8' in LLVM.

There are two choices: (1) Lowering 'unsigned char' into 'i8' is correct. (2) Lowering 'unsigned char' into 'i8' is incorrect.

If (1) is right, then one of the optimizations happening in this example should be disabled, to stop the miscompilation. If (2) is right, then it is clang which miscompiles this example. 'unsigned char' should be lowered into something else.

aqjune commented 6 years ago

In this example removing memcpy inserts ptrtoint / inttoptr casts, hence it is okay.

What if I convert to a different type, such as a vector of two integers, or a struct containing two integers, etc? Is the behavior of the memcpy optimization dependent on the IR type of the memory allocation? (That would seem very wrong to me.)

Does a memcpy from a pointer to a pointer also get converted into a ptrtoint/inttoptr pair, or not?

Converting from a pointer to a vector of two integers (or a structure) - AFAIK LLVM IR does not have a cast operation for that. There is an operation 'bitcast', but it only supports pointer -> pointer (https://llvm.org/docs/LangRef.html#bitcast-to-instruction ). I also think that the behavior of memcpy should not be dependent on the IR type (TBAA may be involved here, but they are managed by separate tags such as !tbaa and !tbaa.struct , so memcpy without such tags should be just a bitwise copy) If we can say that inserting inttoptr / ptrtoint doesn't change the behavior of the program (or have less behavior, to make refinement hold), then replacing memcpy with inttoptr / ptrtoint it would be a sound transformation.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

In this example removing memcpy inserts ptrtoint / inttoptr casts, hence it is okay.

What if I convert to a different type, such as a vector of two integers, or a struct containing two integers, etc? Is the behavior of the memcpy optimization dependent on the IR type of the memory allocation? (That would seem very wrong to me.)

Does a memcpy from a pointer to a pointer also get converted into a ptrtoint/inttoptr pair, or not?

hfinkel commented 6 years ago

(1) Instcombine changes the loop body to "arr3[i] = arr2[i];"

Why? Are we folding the (c1 == c2) comparison to false?

it's doing the following tranformation: a == b ? a : b -> b

Either a == b, and then return a or b is the same, or they are different and it returns b. Hence we can always return b.

Ah, thanks!

On balance, I tend to agree, we should fold this to inttotpr(ptrtoint(x)), and stop there (because, we don't know if there are other dependencies that can affect aliasing). I think that's consistent with my opinion on llvm/llvm-project#33896 .

nunoplopes commented 6 years ago

(1) Instcombine changes the loop body to "arr3[i] = arr2[i];"

Why? Are we folding the (c1 == c2) comparison to false?

it's doing the following tranformation: a == b ? a : b -> b

Either a == b, and then return a or b is the same, or they are different and it returns b. Hence we can always return b.

hfinkel commented 6 years ago

void f(int p, int q) { unsigned long x; unsigned long y; unsigned long z; memcpy((unsigned char)&x, (unsigned char)&p, sizeof(p)); memcpy((unsigned char)&y, (unsigned char)&q, sizeof(p)); if (x == y) z = x; else z = y; memcpy((unsigned char)&p, (unsigned char)&z, sizeof(p)); q = 10; p = 1; }

In this example removing memcpy inserts ptrtoint / inttoptr casts, hence it is okay.

There exists a case where blindly folding (int)(int*)p -> p raises miscompilation in IR (#34548 ), so I believe the folding should be done when the folding is proved to be safe.

I agree. This seems similar.

I'm still missing something from the description (or I'm missing something), for this:

(1) Instcombine changes the loop body to "arr3[i] = arr2[i];"

Why? Are we folding the (c1 == c2) comparison to false?

aqjune commented 6 years ago

void f(int p, int q) { unsigned long x; unsigned long y; unsigned long z; memcpy((unsigned char)&x, (unsigned char)&p, sizeof(p)); memcpy((unsigned char)&y, (unsigned char)&q, sizeof(p)); if (x == y) z = x; else z = y; memcpy((unsigned char)&p, (unsigned char)&z, sizeof(p)); q = 10; p = 1; }

In this example removing memcpy inserts ptrtoint / inttoptr casts, hence it is okay.

There exists a case where blindly folding (int)(int*)p -> p raises miscompilation in IR (#34548 ), so I believe the folding should be done when the folding is proved to be safe.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

I don't think special-casing 'unsigned char' helps. Consider:

void f(int p, int q) { unsigned long x; unsigned long y; unsigned long z; memcpy((unsigned char)&x, (unsigned char)&p, sizeof(p)); memcpy((unsigned char)&y, (unsigned char)&q, sizeof(p)); if (x == y) z = x; else z = y; memcpy((unsigned char)&p, (unsigned char)&z, sizeof(p)); q = 10; p = 1; }

And I can do the same thing using any other type as the intermediate, so long as it retains all bit patterns and == compares all bits. (Eg, 'double' doesn't work, but a vector of 4 'short's would, as would '_Complex int'.)

aqjune commented 6 years ago

I think we can consider a few more alternatives:

Inserting inttoptr/ptrtoint whenever store forwarding happens will inhibit other optimizations, e.g. hoisting load/stores out of a loop. To regain performance, there must be a good way to systematically remove generated inttoptr(ptrtoint(p)) casts.

We can support store forwarding by defining load i8 ptr as returning poison if *ptr contains non-integral value(e.g. a portion of pointer value). Then the store is correct. However, with this solution, lowering 'unsigned char' into 'i8' is now incorrect, because it may raise more UB IMHO. We need something like a 'data' type, or 'd8', that can have any kind of value bitwisely.

Another choice is making 'i8' special - it can have any kind of value, unlike other integer types like i32, i64, ... In this case, casting i8 to other integer types (e.g. zext i8 to i32) should do ptrtoint implicitly.

efriedma-quic commented 6 years ago

From a C++ perspective, I'm not sure it's clear what semantics we want for integer to pointer conversions.

The most aggressive approach would be to adopt "strict pointer safety", as specified in the C++ standard; but that's probably not compatible with a lot of real-world code.

The most conservative approach is a general hole for int-to-pointer conversions: if the program can prove the result of an int-to-pointer conversion is equal to pointers to two different objects, it can be used as if it were a pointer to either object. This rule is closest to what we actually implement; LLVM generally treats "inttoptr" as aliasing anything with an address that escapes, except for some obscure cases like this where it gets folded away. I'm not sure how you would formally specify "if the program can prove" in the C++ standard, though; it's kind of an informal statement of the rule.

Maybe there's some other in-between rule which is practical, but I'm not sure what it would look like.

It would be nice if the standard specified this more explicitly, so this doesn't keep coming up as an open question.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

I think the dependence break here is fine (rather, it better be! :)).

Yeah, the dependence break would be an invalid transformation in the source language, but need not be so in IR. If at the IR level you want to say that integers are just integers, and inttoptr creates a "can alias everything" pointer, then I agree that a memcpy that copies a pointer cannot be folded to a simple copy.

sanjoy commented 6 years ago

It seems to me that the issue here is that optimizing
int c1 = (int)arr1[i], c2 = (int)arr2[i];
if (c1 == c2)
  arr3[i] = arr1[i];
else
  arr3[i] = arr2[i];
to
arr3[i] = arr2[i];

I think the dependence break here is fine (rather, it better be! :)).

I think the problem is that we instcombined

int arr2[8]; int arr3[8]; memcpy(arr2, &q); memcpy(arr3, &arr2); int* ptr; memcpy(&ptr, arr3);

into

int arr2[8]; int arr3[8]; int* ptr = q;

Conceptually we folded inttotpr(ptrtoint(x)) to x, which is disallowed; the only difference (IMO) is that in this case the inttoptr and ptrtoint was done by serializing and deserializing the bits of the pointer via memory.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

(Before we begin: the C++ standard is not clear on what happens if you directly modify the object representation of an object, whether with memcpy or with direct unsigned char accesses. See n3751 and p0593r2 for some background. So I'm going to answer as if the program had instead performed a reinterpret_cast to intptr_t, then split the intptr_t up into unsigned chars, then recombined it, then performed a reinterpret_cast back to an int*. I expect whatever rules we come up with in the C++ standard to be consistent with that.)

The mapping from unsigned char to i8 should be correct.

That's not clear. There's an alternative view that i8 is simply an int not a generic hold (of pointers and integers, for example). The C++ standard defines unsigned char as a generic holder (Richard please correct me if I'm wrong). So it's unclear whether i8 can serve that purpose.

The C++ standard is weak in specifying this. It says:

"A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted to an integer of sufficient size (if any such exists on the implementation) and back to the same pointer type will have its original value; mappings between pointers and integers are otherwise implementation-defined."

Now, a pointer value carries provenance (it points to a specific object, not merely an address). In general, it's not clear what notional values an integer type can carry (the standard never even says that the values of integer types are integers!), but one might assume that an integer type holds integer values, and the value of an object of such a type is determined by only the value representation ([basic.types]p4 suggests this, but that's not true for pointer types, so we must use some caution in interpreting it).

However, the "converted to an integer [...] and back" wording does appear to imply a provenance rule. It's not enough to just invent a suitable integer and convert it; the integer must be being converted back, which means in some sense it must have been originally created by a pointer-to-integer conversion.

It seems to me that the issue here is that optimizing

int c1 = (int)arr1[i], c2 = (int)arr2[i];
if (c1 == c2)
  arr3[i] = arr1[i];
else
  arr3[i] = arr2[i];

to

arr3[i] = arr2[i];

loses the dependence from arr3 on arr1, which was what caused the values in arr3 to satisfy the "converted to an integer [...] and back" rule. So that transformation is incorrect at the source level. (If you want to make use of the C++ provenance rules at the IR level, you would presumably need to make the value of arr3[i] somehow track that the value is dependent on arr1[i].)

nunoplopes commented 6 years ago

This is what is happening inside LLVM: (1) Instcombine changes the loop body to "arr3[i] = arr2[i];" (2) Loop idiom recognizer replaces the loop with a "memcpy(arr3, arr2, 8)" (3) Instcombine does the store forwarding from the memcpy to the load

From your description, it seems like we should see what r is p. What's actually going wrong?

The problem is that once we get the loop body simplified to "arr3[i] = arr2[i]", we lose the connection that arr3 can be either p or q. Unless we define load/store as doing implicit inttoptr/ptrtoint. Here, since we store a pointer as an int (byte-by-byte), we could say that the store is implicitly doing a ptrtoint, and that the "memcpy(&r, arr3, 8)" is doing an implicit inttoptr. If that was the semantics, the store forwarding would be wrong: r cannot be blindly set to q, but rather to inttoptr(ptrtoint(q)) (which in this case can alias with either p or q).

I think this is related with lowering 'unsigned char' in C into 'i8' in LLVM.

There are two choices: (1) Lowering 'unsigned char' into 'i8' is correct. (2) Lowering 'unsigned char' into 'i8' is incorrect.

The mapping from unsigned char to i8 should be correct.

That's not clear. There's an alternative view that i8 is simply an int not a generic hold (of pointers and integers, for example). The C++ standard defines unsigned char as a generic holder (Richard please correct me if I'm wrong). So it's unclear whether i8 can serve that purpose.

The main disadvantage of (ab)using i8 as a generic holder is that it makes certain optimizations wrong or more complicated, such at the implicit ptrtoint/inttoptr issue mentioned above.

hfinkel commented 6 years ago

Created attachment 20304 [details] A source file that raises the bug.

$ cat test-main.c
#include <string.h>
#include <stdio.h>
#include <stdint.h>

// If f(A, B + 4) is given, and integer representation of A and B + 4
// are the same, c1 == c2 in the loop becomes true,
// so arr3 = arr1. Hence r = A, and *A should be 10.
// However, if compiled with -O3, *A is still 1.
void store_10_to_p(int *p, int *q) {
  unsigned char arr1[8];
  unsigned char arr2[8];
  unsigned char arr3[8];
  // Type punning with char* is allowed.
  memcpy((unsigned char*)arr1, (unsigned char *)&p, sizeof(p));
  memcpy((unsigned char*)arr2, (unsigned char *)&q, sizeof(q));
  // Now arr1 is p, arr2 is q.

  for (int i = 0; i < sizeof(q); i++) {
    int c1 = (int)arr1[i], c2 = (int)arr2[i];
    // Note that c1 == c2 is a comparison between _integers_ (not pointers).
    if (c1 == c2)
      // Always true if p and q have same integer representation.
      arr3[i] = arr1[i];
    else
      arr3[i] = arr2[i];
  }
  // Now arr3 is equivalent to arr1, which is p.
  int *r;
  memcpy(&r, (unsigned char *)arr3, sizeof(r));
  // Now r is p.
  *p = 1;
  *r = 10;
}

int main() {
  int A[4], B[4];
  printf("%p %p\n", A, &B[4]);
  if ((uintptr_t)A == (uintptr_t)&B[4]) {
    store_10_to_p(A, &B[4]);
    printf("%d\n", A[0]);
  }
  return 0;
}

$ clang -O3 -o test-main test-main.c
$ ./test-main
0x7fffffffe580 0x7fffffffe580
1
$ clang -O0 -o test-main test-main.c
$ ./test-main
0x7fffffffe580 0x7fffffffe580
10

This is what is happening inside LLVM: (1) Instcombine changes the loop body to "arr3[i] = arr2[i];" (2) Loop idiom recognizer replaces the loop with a "memcpy(arr3, arr2, 8)" (3) Instcombine does the store forwarding from the memcpy to the load

From your description, it seems like we should see what r is p. What's actually going wrong?

I think this is related with lowering 'unsigned char' in C into 'i8' in LLVM.

There are two choices: (1) Lowering 'unsigned char' into 'i8' is correct. (2) Lowering 'unsigned char' into 'i8' is incorrect.

The mapping from unsigned char to i8 should be correct.

If (1) is right, then one of the optimizations happening in this example should be disabled, to stop the miscompilation. If (2) is right, then it is clang which miscompiles this example. 'unsigned char' should be lowered into something else.

llvm / llvm-project

A miscompilation bug with unsigned char #36817

Extended Description