llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.69k stars 11.86k forks source link

LTO fails with "Called function is not the same type as the call" #39659

Closed pirama-arumuga-nainar closed 5 years ago

pirama-arumuga-nainar commented 5 years ago
Bugzilla Link 40312
Resolution FIXED
Resolved on Sep 11, 2019 11:38
Version trunk
OS Linux
CC @dwblaikie,@m-gupta,@pcc,@stephenhines,@vlad902

Extended Description

This happens with LTO and CFI:

Called function is not the same type as the call! tail call void @​EC_KEY_free(%struct.ec_key_st* %5) #​22, !dbg !​33425 in function _ZN9keymaster5EcKey13EvpToInternalEPK11evp_pkey_st LLVM ERROR: Broken function found, compilation aborted!

The failures are caused by Clang change r335385 (originally landed as r335284) and LLVM change r335145.

I am attaching lld's repro.tar which can be run as: $ tar xvf repro.tar $ cd repro $ @​response.txt

The problem is likely in the Clang change as reverting the two changes above and running the link step doesn't fix the issue. I am working on narrowing down the failure to a smaller repro (any tips on doing so with LTO are much appreciated :)

pirama-arumuga-nainar commented 5 years ago

Fix committed as r371643. Thanks for the review Teresa!

pirama-arumuga-nainar commented 5 years ago

I think I've gotten to the triggers for this issue.

  1. Similar to llvm/llvm-project#37032 , a global from a subsequent module gets materialized in an earlier module. In this case, it is the declaration of EC_KEY_free. The type of the global is: void (%struct.ec_key_st.0*)

  2. Subsequently, when the second module is processed, IRLinker::run() calls computeTypeMapping and control reaches https://github.com/llvm/llvm-project/blob/bb17e46644bbebdc665ccdee941aab48a9ee0bd4/llvm/lib/Linker/IRMover.cpp#L762, where SGV is the EC_KEY_free from the second module and the DGV is the materialized EC_KEY_free already present in the destination module, DstM.

  3. While the globals are different, their types are the same and TypeMap.addTypeMapping maps the type to itself.

  4. Furthern down in computeTypeMapping, in the loop over SrcM->getIdentifiedStructTypes(), the type %"class.keymaster::UniquePtr.0" = type { %struct.ec_key_st.0 } from SrcM gets mapped to %"class.keymaster::UniquePtr" = type { %struct.ec_key_st } in the DstM.

In this process, %struct.ec_key_st.0 also gets mapped to %struct.ec_key_st.

  1. When the problematic call in _ZN9keymaster20OpenSslObjectDeleterI9ec_key_stXadL_Z11EC_KEYfreeEEEclEPS1 is processed, the called value, the global, didn't get the new type mapping and still has type: void (%struct.ec_key_st.0) while the call instruction's type gets reconstructed based on the type mapping to: void (%struct.ec_key_st)

Hence the assertion.

The fix, in my opinion, is that in https://github.com/llvm/llvm-project/blob/bb17e46644bbebdc665ccdee941aab48a9ee0bd4/llvm/lib/Linker/IRMover.cpp#L762, we should call TypeMap.addTypeMapping() only if the types of the global are different. If not, an early materialization of a global (based on module summaries, I think?) can leave a global with inconsistent types.

Uploaded https://reviews.llvm.org/D66814. It doesn't regress any other tests and also fixes the original Android build breakage.

pirama-arumuga-nainar commented 5 years ago

We don't get to the remapType() call in because

This should have been: We don't get to the remapType() call at http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/ValueMapper.cpp#442 because ...

pirama-arumuga-nainar commented 5 years ago

We don't get to the remapType() call in because the materializer is able to get a new value (but with the old unmapped types) at http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/ValueMapper.cpp#350.

I'll look at the Materializer now to see if there's something to be changed there.

llvmbot commented 5 years ago

I see, I misunderstood the prior comment. So presumably the getCalledValue() is being remapped by the call to remap each operand via mapValue here: http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/ValueMapper.cpp#872

I see that within mapValue there is a call to remapType on that value down here: http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/ValueMapper.cpp#442

One question would be are we even getting to this part of remapType. If that invocation of remapType doesn't affect the params on an underlying FunctionTy, perhaps some special code needs to be added (similar to the special case type remapping for GEPOperator at http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/ValueMapper.cpp#471)?

Sorry, not super familiar with this aspect of the ValueMapper, so questions rather than a clear answer for you...

pirama-arumuga-nainar commented 5 years ago

CallInst has two handles to the callee: getFunctionType() (http://llvm-cs.pcc.me.uk/include/llvm/IR/InstrTypes.h#1051) and getCalledValue()/getCalledOperand (http://llvm-cs.pcc.me.uk/include/llvm/IR/InstrTypes.h#1183)

ValueMapper remaps the types of the former but not the latter.

But the IR Verifier expects the types returned from getFunctionType() and getCalledValue()->getType() to be the same (http://llvm-cs.pcc.me.uk/lib/IR/Verifier.cpp#2757).

llvmbot commented 5 years ago

Sorry for the silence, I haven't had a chance to look at this in any detail. You mentioned in #​14/#15 that getFunctionType is not being remapped, but the code seems to try to do this (afaict) here: http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/ValueMapper.cpp#914

Is that not kicking in or doing the expected remapping?

pirama-arumuga-nainar commented 5 years ago

Teresa, Peter, can you advice on how to proceed with this bug? In particular, any thoughts on comment #​14?

pirama-arumuga-nainar commented 5 years ago

Oops, wrong function name in the last comment. s/getCalledFunction/getFunctionType/

pirama-arumuga-nainar commented 5 years ago

I believe the problem is in ValueMapper::remapInstruction(), when a CallSite's type is getting remapped. The type getCalledFunction() gets remapped, while getCalledValue() doesn't. But the IR verifier (Verifier::visitCallBase()) asserts that the types returned by getCalledFunction() and getCalledValue() are consistent.

pirama-arumuga-nainar commented 5 years ago

I haven't found the root cause yet. But from looking in the debugger, it seems that the struct types (struct.ec_key_st, class.keymaster::UniquePtr, class.keymaster::g) from the destination module get remapped to one from the source type:

​0 llvm::StructType::setName (this=0x6df9820, Name=...)

at /usr/local/google/work/llvm-upstream-minimal/llvm/lib/IR/Type.cpp:386

​1 0x000000000324ceca in (anonymous namespace)::TypeMapTy::addTypeMapping (

this=0x7fffffffbf30, DstTy=0x6df97d0, SrcTy=0x6df9860)
at /usr/local/google/work/llvm-upstream-minimal/llvm/lib/Linker/IRMover.cpp:106

​2 0x000000000324ba50 in (anonymous namespace)::IRLinker::computeTypeMapping (

this=0x7fffffffbef0)
at /usr/local/google/work/llvm-upstream-minimal/llvm/lib/Linker/IRMover.cpp:773

​3 0x00000000032466ca in (anonymous namespace)::IRLinker::run (this=0x7fffffffbef0)

But the function type "void (%struct.ec_key_st*)" from the destination module uses the type objects from prior to the mapping and hence end up being different.

Can someone familiar with IRMover help me with the following questions:

  1. Is the fix to: in TypeMapTy::addTypeMapping(), update the "types containing/referring to SrcTy" to refer to DstTy?
  2. Is there an utility in lib/Linker that can do this update?
pirama-arumuga-nainar commented 5 years ago

If I remove the definition of class.keymaster::UniquePtr and its references from i2.ll (see below), the assertion/error goes away.

Vlad, I also noticed that your repro commands don't actually trigger the bug. We also need '-g' when compiling the IR files.

$ cat i2.ll %struct.ec_key_st = type opaque define void @​_ZN9keymaster1cEv() { alloca %"class.keymaster::UniquePtr" ret void, !dbg !​26 } !llvm.dbg.cu = !{#0} !llvm.module.flags = !{#19} !​0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !​1, retainedTypes: !​3) !​1 = !DIFile(filename: "software_keyblobs.ii", directory: "lto_bug") !​3 = !{#4} !​4 = !DICompositeType(tag: DW_TAG_class_type, file: !​1, templateParams: !​14) !​9 = !DICompositeType(tag: 9) !​14 = !{#16} !​16 = !DITemplateTypeParameter(type: !​17) !​17 = !DICompositeType(tag: DW_TAG_structure_type, flags: DIFlagFwdDecl, identifier: "_ZTSN9keymaster20OpenSslObjectDeleterI9ec_key_stXadL_Z11EC_KEY_freeEEEE") !​19 = !{i32 2, !"Debug Info Version", i32 3} !​22 = distinct !DISubprogram(unit: !​0) !​24 = !{} !​26 = !DILocation(scope: !​22)

vlad902 commented 5 years ago

I compiled the C down to IR and creduced those down to the following inputs:

$ cat i1.ll %"class.keymaster::g" = type { %"class.keymaster::UniquePtr" } %"class.keymaster::UniquePtr" = type { %struct.ec_key_st } %struct.ec_key_st = type opaque define void @​_ZN9keymaster1gD0Ev(%"class.keymaster::g") { ret void, !dbg !​74 } declare void @​EC_KEY_free(%struct.ec_key_st) define void @​_ZN9keymaster20OpenSslObjectDeleterI9ec_key_stXadL_Z11EC_KEYfreeEEEclEPS1() { entry: %f.addr = alloca %struct.ec_key_stload %struct.ec_key_st, %struct.ec_key_st* %f.addr, !dbg !​89 call void @​EC_KEY_free(%struct.ec_key_st %0)unreachable } !llvm.dbg.cu = !{#0} !llvm.module.flags = !{#4} !​0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !​1) !​1 = !DIFile(filename: "nist_curve_key_exchange.ii", directory: "lto_bug") !​4 = !{i32 2, !"Debug Info Version", i32 3} !​8 = !DICompositeType(tag: DW_TAG_class_type, file: !​1, elements: !​10) !​10 = !{#13} !​13 = !DICompositeType(tag: DW_TAG_class_type, file: !​1, templateParams: !​23) !​18 = !DICompositeType(tag: 1) !​23 = !{#24} !​24 = !DITemplateTypeParameter(type: !​25) !​25 = !DICompositeType(tag: DW_TAG_structure_type, templateParams: !​31, identifier: "_ZTSN9keymaster20OpenSslObjectDeleterI9ec_key_stXadL_Z11EC_KEY_freeEEEE") !​29 = !{} !​31 = !{#33} !​33 = !DITemplateValueParameter(value: void (%struct.ec_key_st) @​EC_KEY_free) !​62 = !DISubprogram(isDefinition: false) !​71 = distinct !DISubprogram(scope: !​8, unit: !​0) !​74 = !DILocation(scope: !​71) !​83 = distinct !DISubprogram(scope: !​25, unit: !​0) !​89 = !DILocation(scope: !​83)

$ cat i2.ll %"class.keymaster::UniquePtr" = type { %struct.ec_key_st* } %struct.ec_key_st = type opaque define void @​_ZN9keymaster1cEv() { alloca %"class.keymaster::UniquePtr" ret void, !dbg !​26 } !llvm.dbg.cu = !{#0} !llvm.module.flags = !{#19} !​0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !​1, retainedTypes: !​3) !​1 = !DIFile(filename: "software_keyblobs.ii", directory: "lto_bug") !​3 = !{#4} !​4 = !DICompositeType(tag: DW_TAG_class_type, file: !​1, templateParams: !​14) !​9 = !DICompositeType(tag: 9) !​14 = !{#16} !​16 = !DITemplateTypeParameter(type: !​17) !​17 = !DICompositeType(tag: DW_TAG_structure_type, flags: DIFlagFwdDecl, identifier: "_ZTSN9keymaster20OpenSslObjectDeleterI9ec_key_stXadL_Z11EC_KEY_freeEEEE") !​19 = !{i32 2, !"Debug Info Version", i32 3} !​22 = distinct !DISubprogram(unit: !​0) !​24 = !{} !​26 = !DILocation(scope: !​22)

$ clang++ -c -flto -o nist_curve_key_exchange.o i1.ll $ clang++ -c -flto -o software_keyblobs.o i2.ll $ ld.lld -shared -o libfoo.so software_keyblobs.o nist_curve_key_exchange.o 2>&1

Looks related to the previous ODR uniquing bugs like llvm/llvm-project#37032 /#30799 . Seems like the type being opaque might be relevant somehow but I don't have time to dig into it further at the moment.

pirama-arumuga-nainar commented 5 years ago

source 2

pirama-arumuga-nainar commented 5 years ago

source 1

pirama-arumuga-nainar commented 5 years ago

I reduced the two source files independently (by fixing the other) using creduce and ended up with a small repro. After downloading the two attached preprocessed sources, run:

$ clang++ -g -w -c -flto -o nist_curve_key_exchange.o nist_curve_key_exchange.ii $ clang++ -g -w -c -flto -o software_keyblobs.o software_keyblobs.ii $ ld.lld -shared -o libfoo.so software_keyblobs.o nist_curve_key_exchange.o

pirama-arumuga-nainar commented 5 years ago

I did check all declarations of EC_KEY_free (there's no definition since it's an external function). All of them had 'struct ec_key_st' as an opaque type, from the same header file.

dwblaikie commented 5 years ago

Peter - is this the same sort of failure you'd see if you violate the ODR? (eg: call (with C linkage) "f(int)" in one file, but define it as f(double) or the like in another)

vlad902 commented 5 years ago

It's worth seeing if bugpoint can reduce them first as it's faster; however, it's often not enough to fully reduce test cases. In my experience creducing textual IR for multiple files with LTO bugs I've found it goes faster if you 1) eliminate the call to topformflat in pass_lines.pm, this flattens IR as if it were C and makes it less effective, and 2) put all of the files into a single file with a delimiter and creduce them all simultaneously.

pcc commented 5 years ago

pirama, maybe you can try creducing those two source files?

pcc commented 5 years ago

Reproduces with:

~/l4/gn/ra/bin/ld.lld "-shared" "ssd1/pirama/aosp1/out/soong/.intermediates/system/keymaster/libkeymaster_portable/android_arm64_armv8-a_cortex-a73_core_static_cfi/obj/system/keymaster/key_blob_utils/software_keyblobs.o" "ssd1/pirama/aosp1/out/soong/.intermediates/system/keymaster/libkeymaster_portable/android_arm64_armv8-a_cortex-a73_core_static_cfi/obj/system/keymaster/km_openssl/nist_curve_key_exchange.o"

pcc commented 5 years ago

One trick that sometimes helps is to use creduce to reduce the response file. I kicked off a run locally with this script and I'll update this bug with the response file that I get back.

!/bin/bash

pwd=pwd

cd ~/android-cfi-bug/repro ~/l4/gn/ra/bin/ld.lld @$pwd/response.txt -o a.out 2>&1 | grep -q 'Called function is not the same type as the call'

pirama-arumuga-nainar commented 5 years ago

The repro is larger than 1MB. I've uploaded the repro to Google Drive: https://drive.google.com/open?id=1Dmx_9sk8VB4d6k802FaIuUYzGE2598-H

pirama-arumuga-nainar commented 5 years ago

assigned to @pirama-arumuga-nainar