Closed hiraditya closed 6 years ago
Fixed in: https://reviews.llvm.org/D40056 and https://reviews.llvm.org/rL317853
Thanks to Volodymyr Sapsai
Considering this, I expect that a correct importing with an incorrect global promotion would end up with a link failure.
In terms of layers: it is obviously incorrect to not expose this call in the graph (why isn't it there in the reference graph though?), but another layer is the importing happily generating incorrect code: we should not import ext
and have it call the external f
, the import should fail explicitly.
I don't understand where is the miscompile here? Isn't there just a missing optimization?
The static function gets dropped from the final binary and all the calls to the conflicting static function gets (incorrectly) redirected to the external function.
I don't understand where is the miscompile here? Isn't there just a missing optimization?
With clang 5.0.0 objdump output for AArch64 (excluding __text) is
_ext: 100007f7c: 01 00 00 14 b #4
_f: 100007f80: e0 03 1b 32 orr w0, wzr, #0x20 100007f84: c0 03 5f d6 ret
_main: 100007f88: f4 4f be a9 stp x20, x19, [sp, #-32]! 100007f8c: fd 7b 01 a9 stp x29, x30, [sp, #16] 100007f90: fd 43 00 91 add x29, sp, #16 100007f94: fb ff ff 97 bl #-20 100007f98: f3 03 00 aa mov x19, x0 100007f9c: f8 ff ff 97 bl #-32 100007fa0: 00 00 13 0b add w0, w0, w19 100007fa4: fd 7b 41 a9 ldp x29, x30, [sp, #16] 100007fa8: f4 4f c2 a8 ldp x20, x19, [sp], #32 100007fac: c0 03 5f d6 ret
I didn't execute it on actual hardware but my understanding is that there is only non-static function f
that returns 32 (== 0x20). main
calls f
and ext
, and ext
calls non-static f
instead of static one.
I don't understand where is the miscompile here? Isn't there just a missing optimization?
Thanks for the hints, Teresa. Posted fix for code review https://reviews.llvm.org/D39356
CalledValue is non-null and the fix like
auto *CalledValue = CS.getCalledValue(); auto *CalledFunction = CS.getCalledFunction(); // Check if this is an alias to a function. If so, get the // called aliasee for the checks below. if (auto *GA = dyn_cast<GlobalAlias>(CalledValue)) { assert(!CalledFunction && "Expected null called function in
callsite for alias"); CalledFunction = dyn_cast
(GA->getBaseObject()); }
- if (CalledValue && !CalledFunction) {
- CalledValue = CalledValue->stripPointerCasts();
- CalledFunction = dyn_cast
(CalledValue); - } // Check if this is a direct call to a known function or a known // intrinsic, or an indirect call with profile data. if (CalledFunction) {
works (it's not final, mostly checking if you can fix the issue by adding call edges).
It sounds like it should - when doing function importing, any time a function is imported we will walk the call and reference edges and mark all as exported. Which didn't happen in this case because of the lack of call edge.
Teresa, what is this bitcast case most similar to? Is it like direct call, indirect call, or something else?
In this case it is still a direct call (call to a known function, not an indirect call via a pointer in a register). Presumably you could also have an indirect call with a bitcast, but that should be handled naturally with your fix. In the indirect call case, CalledFunction will still be null, so we would go to the else that handles possible indirect calls and we would have a non-constant CalledValue.
Actually, looking at that handling in the else clause that is looking for indirect calls:
// Skip direct calls.
if (!CS.getCalledValue() || isa
CalledValue is non-null and the fix like
auto *CalledValue = CS.getCalledValue();
auto *CalledFunction = CS.getCalledFunction();
// Check if this is an alias to a function. If so, get the
// called aliasee for the checks below.
if (auto *GA = dyn_cast<GlobalAlias>(CalledValue)) {
assert(!CalledFunction && "Expected null called function in callsite for alias");
CalledFunction = dyn_cast<Function>(GA->getBaseObject());
}
works (it's not final, mostly checking if you can fix the issue by adding call edges).
Teresa, what is this bitcast case most similar to? Is it like direct call, indirect call, or something else?
I see, we fall back to indirect call handling in this case, but it isn't an indirect call so it doesn't do anything there. And we don't include it in the ref edges because that explicitly skips callees, expecting that they will be added to the call edges.
Is CalledValue non-null in this case? In that case we don't need to teach it to look through bitcasts, just need to adjust the handling of these cases.
Thanks for tracking this down
Thanks for the analysis!
I would say this is a bug in computeFunctionSummary. It should look through these bitcasts. MergeFunctions, for example, will also create calls of bitcast expressions and it's certainly valid IR.
This should be fixed regardless of whether the bitcasts are necessary/desirable in your particular case. That's kind of an orthogonal problem.
As an aside, last time I checked, the inliner wasn't able to inline callsites of some (possibly all?) bitcasts of this sort. If that's a concern for you, you might want to look into that too.
I have investigated the issue and found the following. Static function f is absent in the resulting binary because FunctionImportGlobalProcessing::shouldPromoteLocalToGlobal() decided not to promote it. And it happened because static function f has linkage InternalLinkage which is natural for static function. But it is possible to change linkage for GlobalValueSummary as in thinLTOInternalizeAndPromoteInIndex(), thinLTOInternalizeAndPromoteGUID() we set ExternalLinkage for exported global values.
At this moment we can observe the difference between AArch64 and x86_64. For x86_64 static function f is exported, linkage set to ExternalLinkage, it is promoted to global, renamed, included in the final binary. But for AArch64 static function f isn't exported. And that happens because module reference.o2 doesn't import anything.
That, in turn, happens because instructions for function main are
; Function Attrs: minsize nounwind optsize ssp uwtable define i32 @main() #0 { entry: %retval = alloca i32, align 4 store i32 0, i32 %retval, align 4 %call = call i32 bitcast (i32 (...) @f to i32 ())() #2 %call1 = call i32 bitcast (i32 (...) @ext to i32 ()*)() #2 %add = add nsw i32 %call, %call1 ret i32 %add }
As we collect CallGraphEdges in computeFunctionSummary(), we try
244 | auto CalledValue = CS.getCalledValue(); 245 | auto CalledFunction = CS.getCalledFunction();
and for these 2 call instructions CalledFunction is NULL, that's why we don't add anything to CallGraphEdges.
For comparison, instructions for x86_64 are
; Function Attrs: minsize nounwind optsize ssp uwtable define i32 @main() #0 { entry: %retval = alloca i32, align 4 store i32 0, i32* %retval, align 4 %call = call i32 (...) @f() #2 %call1 = call i32 (...) @ext() #2 %add = add nsw i32 %call, %call1 ret i32 %add }
Currently I consider 2 approaches for fixing this issue:
Reported to xcode radar as well:
Extended Description
The static/internal function with thinlto gets skipped and results in a miscompile. A reduced test case is as follows:
$cat static.c
static int attribute((noinline)) f(int i) { volatile int j = i+10; return j; }
int ext() { return f(100); }
$ cat non-static.c int f(void) { return 32; }
$ cat reference.c int f(); int ext();
int main(void) { return f()+ ext(); }
$ cat run.sh
CC=~/llvm/build/bin/clang LARGS=~/llvm/build/lib/libLTO.dylib
LFLAGS='-fobjc-arc -Xlinker -lto_library -Xlinker' CFLAGS='--save-temps -v -Xlinker -save-temps'
ARCH='-arch arm64 -isysroot /Applications/Xcode_9.0.0_fb.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS11.0.sdk'
$CC -Oz -flto=thin static.c non-static.c reference.c $ARCH $CFLAGS $LFLAGS $LARGS
Only one function 'f' is there and the static function 'f' is gone. $ objdump -d a.out | grep .* Disassembly of section TEXT,text: __text: _ext: _f: _main:
However, when you target x86_64 (just remove the ARGS from command line)
$ objdump -d a.out | grep ^_.* __text: _ext: _f.llvm.4247299495: _f: _main: