Closed pirama-arumuga-nainar closed 5 years ago
Fix committed as r371643. Thanks for the review Teresa!
I think I've gotten to the triggers for this issue.
Similar to llvm/llvm-project#37032 , a global from a subsequent module gets materialized in an earlier module. In this case, it is the declaration of EC_KEY_free. The type of the global is: void (%struct.ec_key_st.0*)
Subsequently, when the second module is processed, IRLinker::run() calls computeTypeMapping and control reaches https://github.com/llvm/llvm-project/blob/bb17e46644bbebdc665ccdee941aab48a9ee0bd4/llvm/lib/Linker/IRMover.cpp#L762, where SGV is the EC_KEY_free from the second module and the DGV is the materialized EC_KEY_free already present in the destination module, DstM.
While the globals are different, their types are the same and TypeMap.addTypeMapping maps the type to itself.
Furthern down in computeTypeMapping, in the loop over SrcM->getIdentifiedStructTypes(), the type %"class.keymaster::UniquePtr.0" = type { %struct.ec_key_st.0 } from SrcM gets mapped to %"class.keymaster::UniquePtr" = type { %struct.ec_key_st } in the DstM.
In this process, %struct.ec_key_st.0 also gets mapped to %struct.ec_key_st.
Hence the assertion.
The fix, in my opinion, is that in https://github.com/llvm/llvm-project/blob/bb17e46644bbebdc665ccdee941aab48a9ee0bd4/llvm/lib/Linker/IRMover.cpp#L762, we should call TypeMap.addTypeMapping() only if the types of the global are different. If not, an early materialization of a global (based on module summaries, I think?) can leave a global with inconsistent types.
Uploaded https://reviews.llvm.org/D66814. It doesn't regress any other tests and also fixes the original Android build breakage.
We don't get to the remapType() call in because
This should have been: We don't get to the remapType() call at http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/ValueMapper.cpp#442 because ...
We don't get to the remapType() call in because the materializer is able to get a new value (but with the old unmapped types) at http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/ValueMapper.cpp#350.
I'll look at the Materializer now to see if there's something to be changed there.
I see, I misunderstood the prior comment. So presumably the getCalledValue() is being remapped by the call to remap each operand via mapValue here: http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/ValueMapper.cpp#872
I see that within mapValue there is a call to remapType on that value down here: http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/ValueMapper.cpp#442
One question would be are we even getting to this part of remapType. If that invocation of remapType doesn't affect the params on an underlying FunctionTy, perhaps some special code needs to be added (similar to the special case type remapping for GEPOperator at http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/ValueMapper.cpp#471)?
Sorry, not super familiar with this aspect of the ValueMapper, so questions rather than a clear answer for you...
CallInst has two handles to the callee: getFunctionType() (http://llvm-cs.pcc.me.uk/include/llvm/IR/InstrTypes.h#1051) and getCalledValue()/getCalledOperand (http://llvm-cs.pcc.me.uk/include/llvm/IR/InstrTypes.h#1183)
ValueMapper remaps the types of the former but not the latter.
But the IR Verifier expects the types returned from getFunctionType() and getCalledValue()->getType() to be the same (http://llvm-cs.pcc.me.uk/lib/IR/Verifier.cpp#2757).
Sorry for the silence, I haven't had a chance to look at this in any detail. You mentioned in #14/#15 that getFunctionType is not being remapped, but the code seems to try to do this (afaict) here: http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/ValueMapper.cpp#914
Is that not kicking in or doing the expected remapping?
Teresa, Peter, can you advice on how to proceed with this bug? In particular, any thoughts on comment #14?
Oops, wrong function name in the last comment. s/getCalledFunction/getFunctionType/
I believe the problem is in ValueMapper::remapInstruction(), when a CallSite's type is getting remapped. The type getCalledFunction() gets remapped, while getCalledValue() doesn't. But the IR verifier (Verifier::visitCallBase()) asserts that the types returned by getCalledFunction() and getCalledValue() are consistent.
I haven't found the root cause yet. But from looking in the debugger, it seems that the struct types (struct.ec_key_st, class.keymaster::UniquePtr, class.keymaster::g) from the destination module get remapped to one from the source type:
at /usr/local/google/work/llvm-upstream-minimal/llvm/lib/IR/Type.cpp:386
this=0x7fffffffbf30, DstTy=0x6df97d0, SrcTy=0x6df9860)
at /usr/local/google/work/llvm-upstream-minimal/llvm/lib/Linker/IRMover.cpp:106
this=0x7fffffffbef0)
at /usr/local/google/work/llvm-upstream-minimal/llvm/lib/Linker/IRMover.cpp:773
But the function type "void (%struct.ec_key_st*)" from the destination module uses the type objects from prior to the mapping and hence end up being different.
Can someone familiar with IRMover help me with the following questions:
If I remove the definition of class.keymaster::UniquePtr and its references from i2.ll (see below), the assertion/error goes away.
Vlad, I also noticed that your repro commands don't actually trigger the bug. We also need '-g' when compiling the IR files.
$ cat i2.ll %struct.ec_key_st = type opaque define void @_ZN9keymaster1cEv() { alloca %"class.keymaster::UniquePtr" ret void, !dbg !26 } !llvm.dbg.cu = !{#0} !llvm.module.flags = !{#19} !0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, retainedTypes: !3) !1 = !DIFile(filename: "software_keyblobs.ii", directory: "lto_bug") !3 = !{#4} !4 = !DICompositeType(tag: DW_TAG_class_type, file: !1, templateParams: !14) !9 = !DICompositeType(tag: 9) !14 = !{#16} !16 = !DITemplateTypeParameter(type: !17) !17 = !DICompositeType(tag: DW_TAG_structure_type, flags: DIFlagFwdDecl, identifier: "_ZTSN9keymaster20OpenSslObjectDeleterI9ec_key_stXadL_Z11EC_KEY_freeEEEE") !19 = !{i32 2, !"Debug Info Version", i32 3} !22 = distinct !DISubprogram(unit: !0) !24 = !{} !26 = !DILocation(scope: !22)
I compiled the C down to IR and creduced those down to the following inputs:
$ cat i1.ll %"class.keymaster::g" = type { %"class.keymaster::UniquePtr" } %"class.keymaster::UniquePtr" = type { %struct.ec_key_st } %struct.ec_key_st = type opaque define void @_ZN9keymaster1gD0Ev(%"class.keymaster::g") { ret void, !dbg !74 } declare void @EC_KEY_free(%struct.ec_key_st) define void @_ZN9keymaster20OpenSslObjectDeleterI9ec_key_stXadL_Z11EC_KEYfreeEEEclEPS1() { entry: %f.addr = alloca %struct.ec_key_stload %struct.ec_key_st, %struct.ec_key_st* %f.addr, !dbg !89 call void @EC_KEY_free(%struct.ec_key_st %0)unreachable } !llvm.dbg.cu = !{#0} !llvm.module.flags = !{#4} !0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1) !1 = !DIFile(filename: "nist_curve_key_exchange.ii", directory: "lto_bug") !4 = !{i32 2, !"Debug Info Version", i32 3} !8 = !DICompositeType(tag: DW_TAG_class_type, file: !1, elements: !10) !10 = !{#13} !13 = !DICompositeType(tag: DW_TAG_class_type, file: !1, templateParams: !23) !18 = !DICompositeType(tag: 1) !23 = !{#24} !24 = !DITemplateTypeParameter(type: !25) !25 = !DICompositeType(tag: DW_TAG_structure_type, templateParams: !31, identifier: "_ZTSN9keymaster20OpenSslObjectDeleterI9ec_key_stXadL_Z11EC_KEY_freeEEEE") !29 = !{} !31 = !{#33} !33 = !DITemplateValueParameter(value: void (%struct.ec_key_st) @EC_KEY_free) !62 = !DISubprogram(isDefinition: false) !71 = distinct !DISubprogram(scope: !8, unit: !0) !74 = !DILocation(scope: !71) !83 = distinct !DISubprogram(scope: !25, unit: !0) !89 = !DILocation(scope: !83)
$ cat i2.ll %"class.keymaster::UniquePtr" = type { %struct.ec_key_st* } %struct.ec_key_st = type opaque define void @_ZN9keymaster1cEv() { alloca %"class.keymaster::UniquePtr" ret void, !dbg !26 } !llvm.dbg.cu = !{#0} !llvm.module.flags = !{#19} !0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, retainedTypes: !3) !1 = !DIFile(filename: "software_keyblobs.ii", directory: "lto_bug") !3 = !{#4} !4 = !DICompositeType(tag: DW_TAG_class_type, file: !1, templateParams: !14) !9 = !DICompositeType(tag: 9) !14 = !{#16} !16 = !DITemplateTypeParameter(type: !17) !17 = !DICompositeType(tag: DW_TAG_structure_type, flags: DIFlagFwdDecl, identifier: "_ZTSN9keymaster20OpenSslObjectDeleterI9ec_key_stXadL_Z11EC_KEY_freeEEEE") !19 = !{i32 2, !"Debug Info Version", i32 3} !22 = distinct !DISubprogram(unit: !0) !24 = !{} !26 = !DILocation(scope: !22)
$ clang++ -c -flto -o nist_curve_key_exchange.o i1.ll $ clang++ -c -flto -o software_keyblobs.o i2.ll $ ld.lld -shared -o libfoo.so software_keyblobs.o nist_curve_key_exchange.o 2>&1
Looks related to the previous ODR uniquing bugs like llvm/llvm-project#37032 /#30799 . Seems like the type being opaque might be relevant somehow but I don't have time to dig into it further at the moment.
I reduced the two source files independently (by fixing the other) using creduce and ended up with a small repro. After downloading the two attached preprocessed sources, run:
$ clang++ -g -w -c -flto -o nist_curve_key_exchange.o nist_curve_key_exchange.ii $ clang++ -g -w -c -flto -o software_keyblobs.o software_keyblobs.ii $ ld.lld -shared -o libfoo.so software_keyblobs.o nist_curve_key_exchange.o
I did check all declarations of EC_KEY_free (there's no definition since it's an external function). All of them had 'struct ec_key_st' as an opaque type, from the same header file.
Peter - is this the same sort of failure you'd see if you violate the ODR? (eg: call (with C linkage) "f(int)" in one file, but define it as f(double) or the like in another)
It's worth seeing if bugpoint can reduce them first as it's faster; however, it's often not enough to fully reduce test cases. In my experience creducing textual IR for multiple files with LTO bugs I've found it goes faster if you 1) eliminate the call to topformflat in pass_lines.pm, this flattens IR as if it were C and makes it less effective, and 2) put all of the files into a single file with a delimiter and creduce them all simultaneously.
pirama, maybe you can try creducing those two source files?
Reproduces with:
~/l4/gn/ra/bin/ld.lld "-shared" "ssd1/pirama/aosp1/out/soong/.intermediates/system/keymaster/libkeymaster_portable/android_arm64_armv8-a_cortex-a73_core_static_cfi/obj/system/keymaster/key_blob_utils/software_keyblobs.o" "ssd1/pirama/aosp1/out/soong/.intermediates/system/keymaster/libkeymaster_portable/android_arm64_armv8-a_cortex-a73_core_static_cfi/obj/system/keymaster/km_openssl/nist_curve_key_exchange.o"
One trick that sometimes helps is to use creduce to reduce the response file. I kicked off a run locally with this script and I'll update this bug with the response file that I get back.
pwd=pwd
cd ~/android-cfi-bug/repro ~/l4/gn/ra/bin/ld.lld @$pwd/response.txt -o a.out 2>&1 | grep -q 'Called function is not the same type as the call'
The repro is larger than 1MB. I've uploaded the repro to Google Drive: https://drive.google.com/open?id=1Dmx_9sk8VB4d6k802FaIuUYzGE2598-H
assigned to @pirama-arumuga-nainar
Extended Description
This happens with LTO and CFI:
Called function is not the same type as the call! tail call void @EC_KEY_free(%struct.ec_key_st* %5) #22, !dbg !33425 in function _ZN9keymaster5EcKey13EvpToInternalEPK11evp_pkey_st LLVM ERROR: Broken function found, compilation aborted!
The failures are caused by Clang change r335385 (originally landed as r335284) and LLVM change r335145.
I am attaching lld's repro.tar which can be run as: $ tar xvf repro.tar $ cd repro $ @response.txt
The problem is likely in the Clang change as reverting the two changes above and running the link step doesn't fix the issue. I am working on narrowing down the failure to a smaller repro (any tips on doing so with LTO are much appreciated :)