Clang generates incorrect TBAA metadata, causing a LICM miscompile

efriedma-quic commented 2 years ago

Consider the following program:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void f(int* x, float *y, int n) {
  int z = 0;
  for (int i = 0; i < n; ++i) {
    x[i] = i;
    *y = 1.0;
  }
}
void (*ff)(int*,float*,int) = f;
int main() {
  void* x = malloc(4);
  ff((int*)x, (float*)x, 1);
  printf("%f\n", ((float*)x)[0]);
}

With gcc, or clang -O0, this prints "1.000000". With clang -O2, this prints "0.000000".

clang is pretty clearly wrong here according to the C effective type rules: in *y = 1.0;, the value 1.0 is "stored into an object having no declared type" (C99 6.5p6), and the value is loaded through an lvalue with the same type. Granted, the rules have been under debate, but I don't think anyone has disputed this particular consequence of the rules...

The C++ rules are a bit more complicated to follow, but I think they're supposed to imply the same thing as the C rules? Anyway, even if you suppose that the previous testcase is somehow undefined in C++, the following should be well-defined, and has the same issue:

#include <cstdio>
#include <cstring>
#include <cstdlib>
#include <new>
void f(void* x, void*y, int n) {
  int z = 0;
  for (int i = 0; i < n; ++i) {
    new ((char*)x + i * sizeof(int)) int(i);
    new (y) float(1.0);
  }
}
void (*ff)(void*,void*,int) = f;
int main() {
  void* x = malloc(4);
  ff(x, x, 1);
  printf("%f\n", ((float*)x)[0]);
}

fhahn commented 2 years ago

Repro on godbolt: https://godbolt.org/z/Tj5s4Pob6

IR repor for LICM (-passes=licm): https://godbolt.org/z/z6drseYdW

Given then TBAA metadata Clang generates, LICM seems to do the correct thing, so likely Clang would need to not generate TBAA metadata in that case. At least for the C case, wouldn't that depend on non-local analysis?

I'll update the title.

; ModuleID = '<source>'
source_filename = "<source>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: argmemonly nofree norecurse nosync nounwind writeonly uwtable
define dso_local void @f(ptr nocapture noundef writeonly %x, ptr nocapture noundef writeonly %y, i32 noundef %n) local_unnamed_addr #0 {
entry:
  %cmp4 = icmp slt i32 0, %n
  br i1 %cmp4, label %for.body.lr.ph, label %for.cond.cleanup

for.body.lr.ph:                                   ; preds = %entry
  br label %for.body

for.cond.for.cond.cleanup_crit_edge:              ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.for.cond.cleanup_crit_edge, %entry
  ret void

for.body:                                         ; preds = %for.body.lr.ph, %for.body
  %i.05 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
  %idxprom = zext i32 %i.05 to i64
  %arrayidx = getelementptr inbounds i32, ptr %x, i64 %idxprom
  store i32 %i.05, ptr %arrayidx, align 4, !tbaa !5
  store float 1.000000e+00, ptr %y, align 4, !tbaa !9
  %inc = add nuw nsw i32 %i.05, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.for.cond.cleanup_crit_edge, !llvm.loop !11
}

attributes #0 = { argmemonly nofree norecurse nosync nounwind writeonly uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.module.flags = !{!0, !1, !2, !3}
!llvm.ident = !{!4}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 2}
!4 = !{!"clang version 15.0.0 (https://github.com/llvm/llvm-project.git c98b601b7fad9500742a9221bd8de4a86d68175e)"}
!5 = !{!6, !6, i64 0}
!6 = !{!"int", !7, i64 0}
!7 = !{!"omnipotent char", !8, i64 0}
!8 = !{!"Simple C/C++ TBAA"}
!9 = !{!10, !10, i64 0}
!10 = !{!"float", !7, i64 0}
!11 = distinct !{!11, !12}
!12 = !{!"llvm.loop.mustprogress"}

llvmbot commented 2 years ago

@llvm/issue-subscribers-clang-codegen

efriedma-quic commented 2 years ago

If we're going to say clang needs to generate different TBAA metadata, I guess there are a few possibilities.

The simplest, I think would be to add a TBAA flag to mark a store as "object-creating": that is, it's allowed to overwrite a value with a different TBAA tag. This is compared to the normal TBAA, where all loads and stores to an address for the whole lifetime of the object are required to be compatible. (I don't think there's any way to represent this with the current TBAA metadata.)

But I think, as a first approximation, clang would end up marking all stores that way? Maybe there are specific cases where we can get away with not marking a store "object-creating".

fhahn commented 2 years ago

The simplest, I think would be to add a TBAA flag to mark a store as "object-creating": that is, it's allowed to overwrite a value with a different TBAA tag. This is compared to the normal TBAA, where all loads and stores to an address for the whole lifetime of the object are required to be compatible. (I don't think there's any way to represent this with the current TBAA metadata.)

I am not sure if just handling this on the store-level will be enough. I think we could have a similar issue when creating objects with complex constructors (Which may not get inlined?).

efriedma-quic commented 2 years ago

If we mark up all stores as object-creating, constructors would work: the stores inside the constructor would be object-creating. We'd only run into trouble if we tried to specifically mark the stores involved with "operator new", and not other stores, or something like that.

rjmccall commented 2 years ago

The consequence here is, what, that TBAA can never be used to reorder stores with each other? Any situation where we know we have declared types at both pointers is presumably a situation where we've resolved each pointer to a concrete variable declaration and no longer need TBAA.

Oh, no, maybe it's slightly stronger than that. If we know that memory has a declared type, then all accesses to that memory must have that type. So if we've resolved one pointer to a concrete variable declaration, then an access to a pointer that does not have a TBAA-compatible type cannot alias that declaration, even if the address of that declaration has otherwise escaped. We'd need to actually record this concept of declared types on declarations in order to take advantage of that, though.

If this is the C rule, I think LLVM should definitely have a way to handle languages with stronger rules that don't allow arbitrary stores to change the type of memory. @atrick, am I correct that Swift in theory would be able to take advantage of that? Presumably we'd need some IR-level representation of type rebinding.

efriedma-quic commented 2 years ago

The consequence here is, what, that TBAA can never be used to reorder stores with each other?

Maybe? I mean, I don't see any way around that conclusion given the rules in the C standard. And as a practical matter, we need some way to reset the effective type of an allocation, if we want to allow reusing a malloc() allocation. If a simple store doesn't do it, it's not clear what does.

Maybe it's worth raising it with the C standard committee.

If this is the C rule, I think LLVM should definitely have a way to handle languages with stronger rules that don't allow arbitrary stores to change the type of memory.

I don't think it makes sense to mess with the existing rules for TBAA metadata in LLVM IR; the existing aliasing metadata has a well-defined meaning, and users outside of clang.

We'd need to actually record this concept of declared types on declarations in order to take advantage of that, though.

Not sure how useful it would be, in practice, to try to implement TBAA reasoning based on declared types. Aliasing checks mostly involve checking aliasing between two memory accesses.

rjmccall commented 2 years ago

Maybe? I mean, I don't see any way around that conclusion given the rules in the C standard. And as a practical matter, we need some way to reset the effective type of an allocation, if we want to allow reusing a malloc() allocation. If a simple store doesn't do it, it's not clear what does.

Right, I'm not trying to argue that this isn't the rule, I'm just trying to get a sense of the implications.

Maybe it's worth raising it with the C standard committee.

They might be surprised by this, yes. It is in practice very restrictive in functions that take pointer arguments. Maybe there's a way to find a rule here that also works for active union members, where aliasing can be assumed to not occur unless there's some sort of derivation within the current scope.

I don't think it makes sense to mess with the existing rules for TBAA metadata in LLVM IR; the existing aliasing metadata has a well-defined meaning, and users outside of clang.

I think you're suggesting that we leave the existing TBAA metadata with a strong rule and make new, weaker TBAA metadata that clang can use, yes? That's fine with me.

Not sure how useful it would be, in practice, to try to implement TBAA reasoning based on declared types. Aliasing checks mostly involve checking aliasing between two memory accesses.

It might be very useful. I'm sure there are interesting cases where pointers to locals escape and we thereafter have to treat other pointers as potentially aliasing the local.

efriedma-quic commented 2 years ago

I don't think it makes sense to mess with the existing rules for TBAA metadata in LLVM IR; the existing aliasing metadata has a well-defined meaning, and users outside of clang.

I think you're suggesting that we leave the existing TBAA metadata with a strong rule and make new, weaker TBAA metadata that clang can use, yes? That's fine with me.

Yes, something like that.

atrick commented 2 years ago

If this is the C rule, I think LLVM should definitely have a way to handle languages with stronger rules that don't allow arbitrary stores to change the type of memory. @atrick, am I correct that Swift in theory would be able to take advantage of that? Presumably we'd need some IR-level representation of type rebinding.

Swift could definitely take advantage of "strong/strict" TBAA with store reordering, as should other languages. The IR-level representation of rebinding a type is trivial: a phantom memory write to 'sizeof(T)' bytes with no TBAA info. I don't see an existing LLVM intrinsic to do that, but it would be easy to invent one:

@llvm.tbaa.rebind(i8* $ptr, i32 %byte_count)

(Still no void* in llvm I guess). We could pass the the actual new pointer type for validation, but it's not essential for codegen.

Disambiguation with and optimization across this intrinsic is still allowed by tracking the provenance of $ptr, ignoring its type.

I think C++ should adopt a similar strategy for placement new. Just emit a tbaa.rebind before the constructor. I'm assuming this is undefined behavior for x == y, even in C++.

  f(void* x, void* y) {
    int *intPtr = new (x) int(0);
    float *floatPtr = new (y) float(0.0);
    *intPtr = 1

fhahn commented 2 years ago

Thanks everybody for chiming in! I think in terms of the TBAA implementation in LLVM, a big takeaway is that it needs to be updated to be location-aware.

I think one of the conservative approaches mentioned is that we only use TBAA info when checking aliasing between a store and a later load. This is something we can check conservatively relatively cheaply.

IIUC a consequence of using marker intrinsics would be that we have to have to check for the presence of those markers between 2 memory instructions, which potentially is quite expensive, especially compared to the current implementation.... It would allow us to give more precise results though and it might also allow us to prove the absence of any markers with LTO. I guess MemorySSA may be an option to check for the presence of markers.

atrick commented 2 years ago

A rebind intrinsic should not change any aspect of memory disambiguation. It just looks like a type-free write to memory.

int *intPtr = new (x) int(0);
float *floatPtr = new (y) float(0.0);

===>

@llvm.rebind(i8* %x, /*bytes*/ 4)
store i32 0, i32* %x, !tbaa "int"
@llvm.rebind(i8* %y, /*bytes*/ 4)
store float 0.0, float* %y, !tbaa "float"

TBAA tells LLVM that the two stores happen to different objects. That's fine. Different objects can never inhabit the same location without full memory dependency in both directions. For the purpose of alias analysis, the rebind instrinsic stands in for all the prior/subsequent memory operations on the same location.

Whether two memory operations affect the same object and whether two pointers have the same integer value is not quite the same thing. Comparing the integer value of two pointers simply should not use TBAA, precisely because memory locations can be reused for different objects. Hopefully we don't rely on that. It's much simpler if we can say that TBAA does not determine pointer inequality.

zygoloid commented 2 years ago

The C++ rules are a bit more complicated to follow, but I think they're supposed to imply the same thing as the C rules?

The C++ rules are substantially stricter than the C rules. In C++, a pointer points to an object, not only to storage, and if that object's lifetime has ended when the store happens, the behavior is undefined. In C, unless you can track storage to a particular stack or global variable (whose effective types cannot be changed by a store), I think you have to assume that every store can change the TBAA information if you want to follow the rules as written.

For this example, in C++:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void f(int* x, float *y, int n) {
  int z = 0;
  for (int i = 0; i < n; ++i) {
    x[i] = i;
    *y = 1.0;
  }
}
void (*ff)(int*,float*,int) = f;
int main() {
  void* x = malloc(4);
  ff((int*)x, (float*)x, 1);
  printf("%f\n", ((float*)x)[0]);
}

... the "implicit object creation" rules kick in: at the point of the malloc, some consistent set of objects is created in the storage by angelic choice. This set of objects cannot contain both an int and float, though, because that would violate the rule that objects cannot overlap in (lifetime, storage) space unless one is nested within the other. So within f, it cannot be the case that x points to an int and y points to a float, and no objects are created, so one of the two stores within the loop must have undefined behavior if the pointers x and y represent the same address. If you prefer to think of it this way, in C++ the effective type is never changed by a simple store, but is set (to some unknown but assumed "correct" value) by memory allocation. One exception: stores whose left-hand side is syntactically a union member access do create objects (and hence set the effective type).

See:

pointers point to objects, not storage: https://eel.is/c++draft/basic.types#basic.compound-3.1
storing through a pointer outside the object's lifetime is undefined: https://eel.is/c++draft/basic.life#7.1
implicit object creation, as performed by malloc: https://eel.is/c++draft/intro.object#10
object creation in union assignment: https://eel.is/c++draft/class.union#general-6

For this example:

#include <cstdio>
#include <cstring>
#include <cstdlib>
#include <new>
void f(void* x, void*y, int n) {
  int z = 0;
  for (int i = 0; i < n; ++i) {
    new ((char*)x + i * sizeof(int)) int(i);
    new (y) float(1.0);
  }
}
void (*ff)(void*,void*,int) = f;
int main() {
  void* x = malloc(4);
  ff(x, x, 1);
  printf("%f\n", ((float*)x)[0]);
}

... the behavior is defined, because placement new creates a new object of the proper type. It would seem ideal if we could fix the behavior of the latter example without pessimizing the former one in C++.

I'm assuming this is undefined behavior for x == y, even in C++.
  f(void* x, void* y) {
    int *intPtr = new (x) int(0);
    float *floatPtr = new (y) float(0.0);
    *intPtr = 1

Yes. The creation of the float object would end the lifetime of the int object pointed to by intPtr (https://eel.is/c++draft/basic.life#1.5), so the store would have undefined behavior.

Note that we do need to be a little careful in C++ about which objects can be nested within which other objects. For example:

struct A { char data[4]; };
int f(void *p) {
  A *a = new (p) A{0};
  // The `int` is nested within `A::data`; does not end the lifetime of `a`.
  int *n = new (p) int(5);
  // Accesses the first byte of the `5` we stored above.
  return a->data[0];
}

See https://eel.is/c++draft/intro.object#4 and https://eel.is/c++draft/intro.object#3 for the rules on what can be "nested within" what.

efriedma-quic commented 2 years ago

The C++ rules are substantially stricter than the C rules.

It's unfortunate if we end up using different rules in C vs. C++; it's hard enough to explain strict aliasing without dealing with quirks like that.

It would seem ideal if we could fix the behavior of the latter example without pessimizing the former one in C++.

I think the only way to do this requires implementing something like "llvm.rebind" proposed by @atrick. We can't depend on changing the markings of the stores involved in new(); for classes, in general, we can't see the stores.

llvm.rebind is sort of tricky, though. I mean, the intrinsic itself is fairly simple, but we'd have to teach a bunch of transforms to generate it or optimize around it to actually implement the rules correctly without killing performance.

jyknight commented 2 years ago

The C++ rules are substantially stricter than the C rules.

It's unfortunate if we end up using different rules in C vs. C++; it's hard enough to explain strict aliasing without dealing with quirks like that.

We could potentially use less-strict (C-compatible) rules for implicit-lifetime types (which should be all types you can write in C), and be more strict for others. Not sure if it's really worthwhile, though, vs just using the less-strict rules for everything.

(Though I'm not really sure TBAA is even worthwhile at all...maybe we should just turn it off by default...)

zygoloid commented 2 years ago

We could use lowest-common-denominator rules everywhere, which would mean we allow any store to change the effective type of any object. That wouldn't quite get us to conformance due to the common initial sequence rule, which it seems we never implemented in Clang's TBAA emission, but it's probably pretty close. For C, that's probably close enough to the full set of optimization opportunities; I doubt we would get much additional benefit by assuming the effective type of a declared variable doesn't change (which incidentally is a rule that C++ doesn't have).

For C++ -- assuming we use something like @llvm.rebind to track object creation events -- the rules permit a store of one type to be reordered past a load of an unrelated type, in either direction, which I think gives substantially more room to optimize than our current TBAA implementation takes advantage of. So I suspect the amount of benefit we can get from TBAA in C++ is quite a bit higher than the benefit we currently get from our TBAA implementation.

(But yeah, as @jyknight says, it may still not be worthwhile.)

forics commented 9 months ago

I ran into a similar problem that TBAA says a previous store and a load is no-alias while they should be may-alias. So is there any progress about this bug?

uecker commented 4 months ago

I would be nice if this could be fixed eventually.

llvm / llvm-project

Clang generates incorrect TBAA metadata, causing a LICM miscompile #54878