Open peterwaller-arm opened 1 month ago
@llvm/issue-subscribers-bolt
Author: Peter Waller (peterwaller-arm)
What's worse is that during we add new objects in discoverFileObjects() - registerName() we're connecting them to section using their address and thus connecting them to a wrong section in such cases. @maksfb @rafaelauler I think we need to use SymbolRef.getSection()
on the first place and we need to extend api of creating BinaryData with passing its real section. The same probably should be applied on BinaryFunction created further, now it is created with getSectionForAddress. What do you think?
Hello @peterwaller-arm ! I've implemented not fix, but check and abort in such situations in BOLT in #100801. Please take a look to a commit message that describes my thoughts about handling such situations in BOLT levels. TL'DR it would be good, but currently too expensive and also I think we need to blame linker that left GOT table in fully static binary, it seems that if you would use LLD the problem would be auto eliminated :)
Problem
patchELFGOT
function currently assumes values in the GOT are pointers-to-functions, and if the function moves, the address is updated.The problem is present on recent (at time of writing) main commit 5f05d5ec8f9bb15c0ac29fce843a2c73165ac414.
Static binary glibc startup crash message
This manifests in glibc static binaries on aarch64-linux with the binary crashing on startup with the error message
Unexpected reloc type in static binary
. What’s happening is that the glibc startup code iterates over an array of reloc, but it runs off the end of the array because the array end pointer which lives in the GOT is no longer valid.It is currently known to manifest on aarch64-linux with glibc, but underlying issue may not be unique to that target or scenario, it may be at risk of happening elsewhere.
Test case (copy paste whole block, produces ‘./testcase’)
objdump --disassemble-all --reloc testcase
output:What breaks
Note above there are GOT entries
1ffe8
(array_start
),1fff0
(array_end
) and1fff8
(_start
), and that the address ofarray_end
shares its address with_start
(equal to00001004
).Running
llvm-bolt -o testcase.bolt testcase
, BOLT erroneously rewrites both these got entries, the new GOT contains:This is incorrect because
1fff0
has been rewritten to point to the new start, but the array data has not moved.Code which is doing
for (void *i = array_start; i != array_end; i++)
will now run off the end of the array.Why does it break
The culprit is the code in
patchELFGOT
which currently assumes that all pointers in the GOT may be interpreted as function pointers:https://github.com/llvm/llvm-project/blob/5f05d5ec8f9bb15c0ac29fce843a2c73165ac414/bolt/lib/Rewrite/RewriteInstance.cpp#L5271-L5277
Thinking about solutions
patchELFGOT
could query the type and use conditionally use the correct getNewFunctionAddress / getBinaryDataAtAddress in each case.array_end
and_start
is that the symbols belong to different sections (even though they share an address), and the sections are of different types. If it were possible to determine the symbol referenced in the GOT then patchELFGOT could straightforwardly avoid patching entries which reference data symbols (or patch them if it's moving them).Can the symbol referenced by a GOT entry be reconstructed?
So far as I’m currently aware we don’t have a straightforward way to determine which symbol a GOT entry points to. If only the address contained within the GOT entry is considered, this does not distinguish
_start
andarray_end
as in the reproducer provided above.Determining which relocations point to a given GOT entry could require determining register values for some programs, since we don’t get a fully-qualified pointer to the GOT from a GOT relocation for a specific symbol alone. Consider that you could have a register value holding a base pointer to a GOT page, and that register value gets reused for multiple different GOT references within the page.
The base pointer could be hoisted out of a loop and ~would not itself~ relocations initializing it would not contain information connecting it to the symbol of interest, so determining the GOT entry being referenced by a relocation would require some kind of symbolic execution to determine the base register value in the worst case.
cc @aaupov @maksfb for BOLT cc @MaskRay because I think you might be interested and knowledgable, particularly if there is some information BOLT can use to differentiate GOT entries containing the same address but referencing different symbols straightforwardly, or if there are alternative approaches to consider in light of additional linker knowledge.