llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.81k stars 11.91k forks source link

clang assumes extern addresses cannot have a known relationship #31023

Open derekbruening opened 7 years ago

derekbruening commented 7 years ago
Bugzilla Link 31675
Version unspecified
OS Linux
CC @efriedma-quic,@zygoloid,@rnk

Extended Description

Clang seems to assume that extern addresses cannot be identical. If we try to compare two assembly labels, clang does not bother to emit a comparison and assumes they are different:


include

extern void asm_label; extern void asm_label2; int main() { __asm("asm_label: asm_label2:"); if (&asm_label != &asm_label2) puts("bug"); else puts("no bug"); return 0; }

/usr/bin/clang clangbug.c -o clangbug && ./clangbug

bug

/usr/bin/gcc clangbug.c -o clangbug && ./clangbug

no bug

Examining the generated code, there is no comparison at all and simply a call to puts("bug"), even at -O0.

/usr/bin/clang --version

clang version 3.5.0 (tags/RELEASE_350/final) Target: x86_64-redhat-linux-gnu Thread model: posix

More recent versions of clang have the same behavior, including a recent clang built from sources at r290297.

Xref https://github.com/DynamoRIO/dynamorio/issues/2124 where a recent clang (r290297 in fact) extends this problem to static const copies of extern addresses.

Is there something in the C language spec that allows clang to make this assumption? This does not seem limited to assembly labels: presumably function aliases could hit the same problem (certainly it happens if these are not "extern void *" but rather declared as extern functions which are supplied in a separately compiled file).

efriedma-quic commented 7 years ago

I'm not sure what you mean: the result is identical for the type of expression you provide.

Hmm, really? Produces a different result for me on LLVM trunk.

Note that "&asm_label + 40" involves a different rule: out-of-bounds pointer arithmetic is undefined behavior.

derekbruening commented 7 years ago

This is applying the same rule: globals aren't allowed to overlap. (Note that you get a different result for "(char)(&asm_label + 1) != (char)&asm_label2".)

I'm not sure what you mean: the result is identical for the type of expression you provide. Clang fails to emit the comparison for any offset value, not just those that could only be true if the globals did overlap. E.g.:


include

extern void asm_label; extern void asm_label2; int main() { __asm("asm_label: .fill 320,1,0x90; asm_label2:"); if ((char)(&asm_label + 40) != (char)&asm_label2) puts("bug"); else puts("no bug"); if ((char)&asm_label + 320 != (char)&asm_label2) puts("bug"); else puts("no bug"); return 0; }

clang version 4.0.0 (trunk 290297) (llvm/trunk 290296) % ../build_rel/bin/clang -o clangbug2 clangbug2.c && ./clangbug2 bug bug

clang version 3.5 % /usr/bin/clang -o clangbug2 clangbug2.c && ./clangbug2 bug bug

% /usr/bin/gcc -o clangbug2 clangbug2.c && ./clangbug2 no bug no bug

We had to create a loophole for these kind of start/end symbol pairs, and you can use that as a workaround. If you declare asm_label and asm_label2 as zero-sized globals, LLVM won't make this assumption.

Thank you for pointing this out -- this does indeed work, though only for recent clangs (doesn't work for 3.5).

efriedma-quic commented 7 years ago

The LLVM memory model assumes that you can't add offsets to get from one global variable to another: http://llvm.org/docs/LangRef.html#pointeraliasing

That section only applies to memory accesses; pointer comparisons have completely different rules.

But I may have reduced the original problem too far, as the original involves comparing one extern function or global plus an offset to another:

This is applying the same rule: globals aren't allowed to overlap. (Note that you get a different result for "(char)(&asm_label + 1) != (char)&asm_label2".)

rnk commented 7 years ago

The LLVM memory model assumes that you can't add offsets to get from one global variable to another: http://llvm.org/docs/LangRef.html#pointeraliasing

The intent is to allow this kind of optimization: extern int x; extern int y[3]; void f(int i) { x = 1; // dead, even if i is -1 y[i] = 42; x = 2; }

I can't give language standard references for that behavior, but I know the rules were written with existing standards in mind.

We had to create a loophole for these kind of start/end symbol pairs, and you can use that as a workaround. If you declare asm_label and asm_label2 as zero-sized globals, LLVM won't make this assumption. Your example would become:

include

extern char asm_label[]; extern char asm_label2[]; int main() { __asm("asm_label: asm_label2:"); if (&asm_label != &asm_label2) puts("bug"); else puts("no bug"); return 0; }

With optimizations, that program prints "no bug" for me locally.

derekbruening commented 7 years ago

Fair enough, I see it in the C language spec. But I may have reduced the original problem too far, as the original involves comparing one extern function or global plus an offset to another:


include

extern void asm_label; extern void asm_label2; int main() { __asm("asm_label: .byte 0x90; asm_label2:"); if ((char)&asm_label + 1 != (char)&asm_label2) puts("incorrect"); else puts("correct"); return 0; }

Here, again, clang omits any comparison and assumes that no matter what offset is added to &asm_label it will never equal &asm_label2.

If instead the comparison is "&asm_label > &asm_label2", clang does generate a comparison and prints "correct".

Declaring as function pointers rather than void*, clang warns "ordered comparison of function pointers", which is reasonable, but continues on and generates a comparison.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 7 years ago

No language standard we support permits distinct globals to have the same address. And LLVM also assumes that distinct globals do not alias, with no knowledge of the source language.

derekbruening commented 7 years ago

Note that this code is being compiled as C, not C++, and no C++ standard is requested.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 7 years ago

The C++ spec requires we make this assumption; &a == &b is a constant expression in C++11 onwards if a and b are globals.

We already exempt declarations with some attributes from this behaviour (ones with weak linkage in particular); it might be reasonable to provide an attribute to indicate that one global might alias some portion of another.