Open derekbruening opened 7 years ago
I'm not sure what you mean: the result is identical for the type of expression you provide.
Hmm, really? Produces a different result for me on LLVM trunk.
Note that "&asm_label + 40" involves a different rule: out-of-bounds pointer arithmetic is undefined behavior.
This is applying the same rule: globals aren't allowed to overlap. (Note that you get a different result for "(char)(&asm_label + 1) != (char)&asm_label2".)
I'm not sure what you mean: the result is identical for the type of expression you provide. Clang fails to emit the comparison for any offset value, not just those that could only be true if the globals did overlap. E.g.:
clang version 4.0.0 (trunk 290297) (llvm/trunk 290296) % ../build_rel/bin/clang -o clangbug2 clangbug2.c && ./clangbug2 bug bug
clang version 3.5 % /usr/bin/clang -o clangbug2 clangbug2.c && ./clangbug2 bug bug
% /usr/bin/gcc -o clangbug2 clangbug2.c && ./clangbug2 no bug no bug
We had to create a loophole for these kind of start/end symbol pairs, and you can use that as a workaround. If you declare asm_label and asm_label2 as zero-sized globals, LLVM won't make this assumption.
Thank you for pointing this out -- this does indeed work, though only for recent clangs (doesn't work for 3.5).
The LLVM memory model assumes that you can't add offsets to get from one global variable to another: http://llvm.org/docs/LangRef.html#pointeraliasing
That section only applies to memory accesses; pointer comparisons have completely different rules.
But I may have reduced the original problem too far, as the original involves comparing one extern function or global plus an offset to another:
This is applying the same rule: globals aren't allowed to overlap. (Note that you get a different result for "(char)(&asm_label + 1) != (char)&asm_label2".)
The LLVM memory model assumes that you can't add offsets to get from one global variable to another: http://llvm.org/docs/LangRef.html#pointeraliasing
The intent is to allow this kind of optimization: extern int x; extern int y[3]; void f(int i) { x = 1; // dead, even if i is -1 y[i] = 42; x = 2; }
I can't give language standard references for that behavior, but I know the rules were written with existing standards in mind.
We had to create a loophole for these kind of start/end symbol pairs, and you can use that as a workaround. If you declare asm_label and asm_label2 as zero-sized globals, LLVM won't make this assumption. Your example would become:
extern char asm_label[]; extern char asm_label2[]; int main() { __asm("asm_label: asm_label2:"); if (&asm_label != &asm_label2) puts("bug"); else puts("no bug"); return 0; }
With optimizations, that program prints "no bug" for me locally.
Fair enough, I see it in the C language spec. But I may have reduced the original problem too far, as the original involves comparing one extern function or global plus an offset to another:
Here, again, clang omits any comparison and assumes that no matter what offset is added to &asm_label it will never equal &asm_label2.
If instead the comparison is "&asm_label > &asm_label2", clang does generate a comparison and prints "correct".
Declaring as function pointers rather than void*, clang warns "ordered comparison of function pointers", which is reasonable, but continues on and generates a comparison.
No language standard we support permits distinct globals to have the same address. And LLVM also assumes that distinct globals do not alias, with no knowledge of the source language.
Note that this code is being compiled as C, not C++, and no C++ standard is requested.
The C++ spec requires we make this assumption; &a == &b is a constant expression in C++11 onwards if a and b are globals.
We already exempt declarations with some attributes from this behaviour (ones with weak linkage in particular); it might be reasonable to provide an attribute to indicate that one global might alias some portion of another.
Extended Description
Clang seems to assume that extern addresses cannot be identical. If we try to compare two assembly labels, clang does not bother to emit a comparison and assumes they are different:
include
extern void asm_label; extern void asm_label2; int main() { __asm("asm_label: asm_label2:"); if (&asm_label != &asm_label2) puts("bug"); else puts("no bug"); return 0; }
/usr/bin/clang clangbug.c -o clangbug && ./clangbug
bug
/usr/bin/gcc clangbug.c -o clangbug && ./clangbug
no bug
Examining the generated code, there is no comparison at all and simply a call to puts("bug"), even at -O0.
/usr/bin/clang --version
clang version 3.5.0 (tags/RELEASE_350/final) Target: x86_64-redhat-linux-gnu Thread model: posix
More recent versions of clang have the same behavior, including a recent clang built from sources at r290297.
Xref https://github.com/DynamoRIO/dynamorio/issues/2124 where a recent clang (r290297 in fact) extends this problem to static const copies of extern addresses.
Is there something in the C language spec that allows clang to make this assumption? This does not seem limited to assembly labels: presumably function aliases could hit the same problem (certainly it happens if these are not "extern void *" but rather declared as extern functions which are supplied in a separately compiled file).