llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.92k stars 11.52k forks source link

Missing static function with thinlto on AArch64 when an extern function with same name exists. #34314

Closed hiraditya closed 6 years ago

hiraditya commented 6 years ago
Bugzilla Link 34966
Resolution FIXED
Resolved on Dec 04, 2017 17:29
Version trunk
OS All
CC @compnerd,@dwblaikie,@joker-eph,@vsapsai

Extended Description

The static/internal function with thinlto gets skipped and results in a miscompile. A reduced test case is as follows:

$cat static.c

static int attribute((noinline)) f(int i) { volatile int j = i+10; return j; }

int ext() { return f(100); }

$ cat non-static.c int f(void) { return 32; }

$ cat reference.c int f(); int ext();

int main(void) { return f()+ ext(); }

$ cat run.sh

CC=~/llvm/build/bin/clang LARGS=~/llvm/build/lib/libLTO.dylib

LFLAGS='-fobjc-arc -Xlinker -lto_library -Xlinker' CFLAGS='--save-temps -v -Xlinker -save-temps'

ARCH='-arch arm64 -isysroot /Applications/Xcode_9.0.0_fb.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS11.0.sdk'

$CC -Oz -flto=thin static.c non-static.c reference.c $ARCH $CFLAGS $LFLAGS $LARGS

Only one function 'f' is there and the static function 'f' is gone. $ objdump -d a.out | grep .* Disassembly of section TEXT,text: __text: _ext: _f: _main:

However, when you target x86_64 (just remove the ARGS from command line)

$ objdump -d a.out | grep ^_.* __text: _ext: _f.llvm.4247299495: _f: _main:

hiraditya commented 6 years ago

Fixed in: https://reviews.llvm.org/D40056 and https://reviews.llvm.org/rL317853

Thanks to Volodymyr Sapsai

joker-eph commented 6 years ago

Considering this, I expect that a correct importing with an incorrect global promotion would end up with a link failure. In terms of layers: it is obviously incorrect to not expose this call in the graph (why isn't it there in the reference graph though?), but another layer is the importing happily generating incorrect code: we should not import ext and have it call the external f, the import should fail explicitly.

hiraditya commented 6 years ago

I don't understand where is the miscompile here? Isn't there just a missing optimization?

The static function gets dropped from the final binary and all the calls to the conflicting static function gets (incorrectly) redirected to the external function.

vsapsai commented 6 years ago

I don't understand where is the miscompile here? Isn't there just a missing optimization?

With clang 5.0.0 objdump output for AArch64 (excluding __text) is

_ext: 100007f7c: 01 00 00 14 b #​4

_f: 100007f80: e0 03 1b 32 orr w0, wzr, #​0x20 100007f84: c0 03 5f d6 ret

_main: 100007f88: f4 4f be a9 stp x20, x19, [sp, #-32]! 100007f8c: fd 7b 01 a9 stp x29, x30, [sp, #​16] 100007f90: fd 43 00 91 add x29, sp, #​16 100007f94: fb ff ff 97 bl #-20 100007f98: f3 03 00 aa mov x19, x0 100007f9c: f8 ff ff 97 bl #-32 100007fa0: 00 00 13 0b add w0, w0, w19 100007fa4: fd 7b 41 a9 ldp x29, x30, [sp, #​16] 100007fa8: f4 4f c2 a8 ldp x20, x19, [sp], #​32 100007fac: c0 03 5f d6 ret

I didn't execute it on actual hardware but my understanding is that there is only non-static function f that returns 32 (== 0x20). main calls f and ext, and ext calls non-static f instead of static one.

joker-eph commented 6 years ago

I don't understand where is the miscompile here? Isn't there just a missing optimization?

vsapsai commented 6 years ago

Thanks for the hints, Teresa. Posted fix for code review https://reviews.llvm.org/D39356

llvmbot commented 6 years ago

CalledValue is non-null and the fix like

   auto *CalledValue = CS.getCalledValue();
   auto *CalledFunction = CS.getCalledFunction();
   // Check if this is an alias to a function. If so, get the
   // called aliasee for the checks below.
   if (auto *GA = dyn_cast<GlobalAlias>(CalledValue)) {
     assert(!CalledFunction && "Expected null called function in

callsite for alias"); CalledFunction = dyn_cast(GA->getBaseObject()); }

  • if (CalledValue && !CalledFunction) {
  • CalledValue = CalledValue->stripPointerCasts();
  • CalledFunction = dyn_cast(CalledValue);
  • } // Check if this is a direct call to a known function or a known // intrinsic, or an indirect call with profile data. if (CalledFunction) {

works (it's not final, mostly checking if you can fix the issue by adding call edges).

It sounds like it should - when doing function importing, any time a function is imported we will walk the call and reference edges and mark all as exported. Which didn't happen in this case because of the lack of call edge.

Teresa, what is this bitcast case most similar to? Is it like direct call, indirect call, or something else?

In this case it is still a direct call (call to a known function, not an indirect call via a pointer in a register). Presumably you could also have an indirect call with a bitcast, but that should be handled naturally with your fix. In the indirect call case, CalledFunction will still be null, so we would go to the else that handles possible indirect calls and we would have a non-constant CalledValue.

Actually, looking at that handling in the else clause that is looking for indirect calls: // Skip direct calls. if (!CS.getCalledValue() || isa(CS.getCalledValue())) continue; We should presumably be using CalledValue here not CS.getCalledValue(). Especially after your change to strip pointer casts.

vsapsai commented 6 years ago

CalledValue is non-null and the fix like

   auto *CalledValue = CS.getCalledValue();
   auto *CalledFunction = CS.getCalledFunction();
   // Check if this is an alias to a function. If so, get the
   // called aliasee for the checks below.
   if (auto *GA = dyn_cast<GlobalAlias>(CalledValue)) {
     assert(!CalledFunction && "Expected null called function in callsite for alias");
     CalledFunction = dyn_cast<Function>(GA->getBaseObject());
   }

works (it's not final, mostly checking if you can fix the issue by adding call edges).

Teresa, what is this bitcast case most similar to? Is it like direct call, indirect call, or something else?

llvmbot commented 6 years ago

I see, we fall back to indirect call handling in this case, but it isn't an indirect call so it doesn't do anything there. And we don't include it in the ref edges because that explicitly skips callees, expecting that they will be added to the call edges.

Is CalledValue non-null in this case? In that case we don't need to teach it to look through bitcasts, just need to adjust the handling of these cases.

Thanks for tracking this down

llvmbot commented 6 years ago

Thanks for the analysis!

I would say this is a bug in computeFunctionSummary. It should look through these bitcasts. MergeFunctions, for example, will also create calls of bitcast expressions and it's certainly valid IR.

This should be fixed regardless of whether the bitcasts are necessary/desirable in your particular case. That's kind of an orthogonal problem.

As an aside, last time I checked, the inliner wasn't able to inline callsites of some (possibly all?) bitcasts of this sort. If that's a concern for you, you might want to look into that too.

vsapsai commented 6 years ago

I have investigated the issue and found the following. Static function f is absent in the resulting binary because FunctionImportGlobalProcessing::shouldPromoteLocalToGlobal() decided not to promote it. And it happened because static function f has linkage InternalLinkage which is natural for static function. But it is possible to change linkage for GlobalValueSummary as in thinLTOInternalizeAndPromoteInIndex(), thinLTOInternalizeAndPromoteGUID() we set ExternalLinkage for exported global values.

At this moment we can observe the difference between AArch64 and x86_64. For x86_64 static function f is exported, linkage set to ExternalLinkage, it is promoted to global, renamed, included in the final binary. But for AArch64 static function f isn't exported. And that happens because module reference.o2 doesn't import anything.

That, in turn, happens because instructions for function main are

; Function Attrs: minsize nounwind optsize ssp uwtable define i32 @​main() #​0 { entry: %retval = alloca i32, align 4 store i32 0, i32 %retval, align 4 %call = call i32 bitcast (i32 (...) @​f to i32 ())() #​2 %call1 = call i32 bitcast (i32 (...) @​ext to i32 ()*)() #​2 %add = add nsw i32 %call, %call1 ret i32 %add }

As we collect CallGraphEdges in computeFunctionSummary(), we try

244 | auto CalledValue = CS.getCalledValue(); 245 | auto CalledFunction = CS.getCalledFunction();

and for these 2 call instructions CalledFunction is NULL, that's why we don't add anything to CallGraphEdges.

For comparison, instructions for x86_64 are

; Function Attrs: minsize nounwind optsize ssp uwtable define i32 @​main() #​0 { entry: %retval = alloca i32, align 4 store i32 0, i32* %retval, align 4 %call = call i32 (...) @​f() #​2 %call1 = call i32 (...) @​ext() #​2 %add = add nsw i32 %call, %call1 ret i32 %add }

Currently I consider 2 approaches for fixing this issue:

hiraditya commented 6 years ago

Reported to xcode radar as well:

https://bugreport.apple.com/web/?problemID=35099885