Closed Lawqup closed 3 months ago
Hello! Thanks for raise an issue, should be fixed by #100890
Hi @yota9, thanks for taking a look. We have found that at least on AAarch64 our linker's behavior was not correct and it should not have generated a relocation for an absolute symbol. When tested on a toolchain with the fix here, the relocation was not generated so BOLT and pre-BOLT assembly were both correct. We still think this should be fixed in BOLT as well so it works correctly with older toolchains and had come up with a similar patch to yours. This may explain why you were not able to reproduce it locally if you were using a newer toolchain.
@s-dag Hi! I agree that in this case using GOT table is excess on the first place. With different options clang/gcc would output different assembly that would have different impact on BOLT. But BOLT patch fixes other "legal" problems too, we still need to apply it, so I think I would be able to push it ASAP when someone from meta team would review it :)
Hello @yota9, not sure i understand. why do we need bolt to process it to use GOT table, is it a difference between clang and gcc behavior? because according to the ld toolchain fix, on aarch64 the relocation for ABS symbols should not be emitted at all thus no need to be handled by bolt relocation pass. so here the expected behavior would be simply to ignore any seen relocation for ABS symbols right? then for example we would have the following on aarch64
+ // This might be a relocation for an ABS symbols like __global_pointer$ on
+ // RISC-V
+- ContainingBF->addRelocation(Rel.getOffset(), ReferencedSymbol,
+- Rel.getType(), 0,
+- cantFail(Symbol.getValue()));
++ // on aarch64 with ld version prior to 2.40 relocation is also mistakenly
++ // generated for an ABS symbol. in this case we ignore such relocation without
++ // rewriting
++ if (!IsAArch64) {
++ ContainingBF->addRelocation(Rel.getOffset(), ReferencedSymbol,
++ Rel.getType(), 0,
++ cantFail(Symbol.getValue()));
++ }
+ return;
@johndoe2012 Hello! Sorry, I might not understand your question properly, but we need to process it with the GOT table because originally the binary has a GOT relocation and a GOT access to get the symbol. We are not creating any GOT access by this code or near it, we only preserve it if it was originally built this way. If original binary was built with the ld fix above it won't have GOT access and thus won't have it in bolted binary too.
Hello @yota9, while testing the patch another issue surfaces regarding the handling of R_AARCH64_ADR_PREL_PG_HI21 relocs of ABS symbols in our environment. we dont have much detail yet but is working on a reproducer and more digging into the root cause.
regarding my previous comment, i was talking about two approaches on the fix:
considering the right behavior is not to generate relocs for abs symbol on aarch64, wouldn't ignoring it be the correct action here?
Thanks,
Hello @johndoe2012! Sorry, at this point I could not tell much without looking at your binary and the problem you observe :( Currently I don't see any problems with my patch but observe some fixed test cases for both aarch64 and x86. So please provide the binary and exact place in it where you observe incorrect behaviour.
Hi @yota9, we had to prioritize the dynamic linking issue first but we will try to come up with a simple enough reproducer here again for the relocation of absolute symbols. We're seeing with https://github.com/llvm/llvm-project/pull/100890 (and yes it does fix build id reproducer above), relocation of absolute symbols are still handled incorrectly on Aarch64.
@s-dag Hi. I need more information of what kind of problems do you see. Better with test, but at least diff of old and new binaries in a place that has a problem
hello @yota9, terribly sorry we haven't been able to produce a minimal reproducer, but here are some descriptions of the problem we see:
# in linker script we define __metadata_start to a fixed address, 0x5000000 in this example
SECTIONS
{
...
__metadata_start = METADATA_BASE_ADDR;
}
# disassembly in original binary
0000000004322150 <foo_function>:
4322150: d00066e0 adrp x0, 5000000 <METADATA_BASE_ADDR> <<<<< intended
4322150: R_AARCH64_ADR_PREL_PG_HI21 __metadata_start
4322154: 91000000 add x0, x0, #0x0
4322154: R_AARCH64_ADD_ABS_LO12_NC __metadata_start
4322158: b9400400 ldr w0, [x0, #4]
432215c: 7100001f cmp w0, #0x0
4322160: 1a9f07e0 cset w0, ne // ne = any
4322164: d65f03c0 ret
# bolt, internally parsed relocation. the information is correct
Relocation: offset = 0x4322150; type = R_AARCH64_ADR_PREL_PG_HI21; value = 0x5000000; symbol = __metadata_start (); symbol address = 0x5000000; addend = 0x0; address = 0x5000000; in = foo_function
Relocation: offset = 0x4322154; type = R_AARCH64_ADD_ABS_LO12_NC; value = 0x0; symbol = __metadata_start (); symbol address = 0x5000000; addend = 0x0; address = 0x5000000; in = foo_function
BOLT-INFO: pre-processing profile using branch profile reader
# disassembly in bolted binary, with this patch
000000000432e7c8 <foo_function>:
432e7c8: d00066e0 adrp x0, 500c000 <METADATA_BASE_ADDR+0xc000>. <<<<< wrong
432e7cc: 91000000 add x0, x0, #0x0
432e7d0: b9400400 ldr w0, [x0, #4]
432e7d4: 7100001f cmp w0, #0x0
432e7d8: 1a9f07e0 cset w0, ne // ne = any
432e7dc: d65f03c0 ret
# before [this patch](https://github.com/llvm/llvm-project/pull/100890), bolt applies the relocs and fixes adrp target to symbol addresss 0x5000000, likely by accident. after this patch, wrong address target is used.
# we know the address is already wrong when bolt enters `handleRelocation()`, but still trying to trace where exactly it is altered
Thanks @johndoe2012 ! Looks strange, I think we've got plenty of tests with adr/adrp , not sure why it is happening. If you willing to share the binary I would try to check what is going on there.
hello @yota9, thanks for the quick response. but sharing the full binary is unlikely. we are attempting a simple reproducer but currently have some difficulty achieving the intended adrp x0, 5000000 <METADATA_BASE_ADDR>
as in the original binary case. nonetheless we will continue to work on the reproducer while dig into where bolt alters the value.
@johndoe2012 Please check BOLT with --print-all --print-only= foo_function, and write here e.g. after disassembly, cfg and somewhere in the last pass what would it print out for this ADRP instruction
hi @yota9, please find following the bolt logs. there are several more passes but the 00000000: adrp x0, #13492224 # debug line foo_file.c:393
value is unchanged
(edited to add last pass "after clean-mc-state")
Binary Function "foo_function" after disassembly {
Number : 4909
State : disassembled
Address : 0x4322150
Size : 0x18
MaxSize : 0x20
Offset : 0x332150
Section : .text
Orc Section : .local.text.foo_function
LSDA : 0x0
IsSimple : 1
IsMultiEntry: 0
IsSplit : 0
BB Count : 0
}
.LBB04908:
00000000: adrp x0, #13492224 # debug line foo_file.c:393
00000004: add x0, x0, #0x0 # debug line foo_file.c:393
00000008: ldr w0, [x0, #0x4] # debug line foo_file.c:393
0000000c: cmp w0, #0x0 # debug line foo_file.c:393
00000010: cset w0, ne # debug line foo_file.c:394
00000014: ret x30 # Offset: 20 # debug line foo_file.c:394
DWARF CFI Instructions:
<empty>
End of Function "foo_function"
Binary Function "foo_function" while building cfg {
Number : 4909
State : CFG constructed
Address : 0x4322150
Size : 0x18
MaxSize : 0x20
Offset : 0x332150
Section : .text
Orc Section : .local.text.foo_function
LSDA : 0x0
IsSimple : 1
IsMultiEntry: 0
IsSplit : 0
BB Count : 1
Hash : 232f37ac2717279a
BB Layout : .LBB04908
0 : executed forward branches
0 : taken forward branches
0 : executed backward branches
0 : taken backward branches
0 : executed unconditional branches
0 : all function calls
0 : indirect calls
0 : PLT calls
0 : executed instructions
0 : executed load instructions
0 : executed store instructions
0 : taken jump table branches
0 : taken unknown indirect branches
0 : total branches
0 : taken branches
0 : non-taken conditional branches
0 : taken conditional branches
0 : all conditional branches
0 : linker-inserted veneer calls
}
.LBB04908 (6 instructions, align : 1)
Entry Point
CFI State : 0
00000000: adrp x0, #13492224 # debug line foo_file.c:393
00000004: add x0, x0, #0x0 # debug line foo_file.c:393
00000008: ldr w0, [x0, #0x4] # debug line foo_file.c:393
0000000c: cmp w0, #0x0 # debug line foo_file.c:393
00000010: cset w0, ne # debug line foo_file.c:394
00000014: ret x30 # Offset: 20 # debug line foo_file.c:394
CFI State: 0
DWARF CFI Instructions:
<empty>
End of Function "foo_function"
Binary Function "foo_function" after attaching profile {
Number : 4909
State : CFG constructed
Address : 0x4322150
Size : 0x18
MaxSize : 0x20
Offset : 0x332150
Section : .text
Orc Section : .local.text.foo_function
LSDA : 0x0
IsSimple : 1
IsMultiEntry: 0
IsSplit : 0
BB Count : 1
Hash : 232f37ac2717279a
BB Layout : .LBB04908
0 : executed forward branches
0 : taken forward branches
0 : executed backward branches
0 : taken backward branches
0 : executed unconditional branches
0 : all function calls
0 : indirect calls
0 : PLT calls
0 : executed instructions
0 : executed load instructions
0 : executed store instructions
0 : taken jump table branches
0 : taken unknown indirect branches
0 : total branches
0 : taken branches
0 : non-taken conditional branches
0 : taken conditional branches
0 : all conditional branches
0 : linker-inserted veneer calls
}
.LBB04908 (6 instructions, align : 1)
Entry Point
CFI State : 0
00000000: adrp x0, #13492224 # debug line foo_file.c:393
00000004: add x0, x0, #0x0 # debug line foo_file.c:393
00000008: ldr w0, [x0, #0x4] # debug line foo_file.c:393
0000000c: cmp w0, #0x0 # debug line foo_file.c:393
00000010: cset w0, ne # debug line foo_file.c:394
00000014: ret x30 # Offset: 20 # debug line foo_file.c:394
CFI State: 0
DWARF CFI Instructions:
<empty>
End of Function "foo_function"
Binary Function "foo_function" after building cfg {
Number : 4909
State : CFG constructed
Address : 0x4322150
Size : 0x18
MaxSize : 0x20
Offset : 0x332150
Section : .text
Orc Section : .local.text.foo_function
LSDA : 0x0
IsSimple : 1
IsMultiEntry: 0
IsSplit : 0
BB Count : 1
Hash : 232f37ac2717279a
BB Layout : .LBB04908
0 : executed forward branches
0 : taken forward branches
0 : executed backward branches
0 : taken backward branches
0 : executed unconditional branches
0 : all function calls
0 : indirect calls
0 : PLT calls
0 : executed instructions
0 : executed load instructions
0 : executed store instructions
0 : taken jump table branches
0 : taken unknown indirect branches
0 : total branches
0 : taken branches
0 : non-taken conditional branches
0 : taken conditional branches
0 : all conditional branches
0 : linker-inserted veneer calls
}
.LBB04908 (6 instructions, align : 1)
Entry Point
CFI State : 0
00000000: adrp x0, #13492224 # debug line foo_file.c:393
00000004: add x0, x0, #0x0 # debug line foo_file.c:393
00000008: ldr w0, [x0, #0x4] # debug line foo_file.c:393
0000000c: cmp w0, #0x0 # debug line foo_file.c:393
00000010: cset w0, ne # debug line foo_file.c:394
00000014: ret x30 # debug line foo_file.c:394
CFI State: 0
DWARF CFI Instructions:
<empty>
End of Function "foo_function"
Binary Function "foo_function" after clean-mc-state {
Number : 4909
State : CFG finalized
Address : 0x4322150
Size : 0x18
MaxSize : 0x20
Offset : 0x332150
Section : .text
Orc Section : .text.cold
LSDA : 0x0
IsSimple : 1
IsMultiEntry: 0
IsSplit : 0
BB Count : 1
Hash : 232f37ac2717279a
BB Layout : .LBB04908
0 : executed forward branches
0 : taken forward branches
0 : executed backward branches
0 : taken backward branches
0 : executed unconditional branches
0 : all function calls
0 : indirect calls
0 : PLT calls
0 : executed instructions
0 : executed load instructions
0 : executed store instructions
0 : taken jump table branches
0 : taken unknown indirect branches
0 : total branches
0 : taken branches
0 : non-taken conditional branches
0 : taken conditional branches
0 : all conditional branches
0 : linker-inserted veneer calls
}
.LBB04908 (6 instructions, align : 1)
Entry Point
CFI State : 0
00000000: adrp x0, #13492224 # debug line foo_file.c:393
00000004: add x0, x0, #0x0 # debug line foo_file.c:393
00000008: ldr w0, [x0, #0x4] # debug line foo_file.c:393
0000000c: cmp w0, #0x0 # debug line foo_file.c:393
00000010: cset w0, ne # debug line foo_file.c:394
00000014: ret x30 # debug line foo_file.c:394
DWARF CFI Instructions:
<empty>
End of Function "foo_function"
It seems that the adrp relocation was skipped for some strange reason, it should be adrp x0, metadata_start not 13492224 imm value. It would be interested to know the reason behind it. I suggest you to run with --debug-only=bolt and check the messages after the `Relocation: offset = 0x4322150; type = R_AARCH64_ADR_PREL_PG_HI21; value = 0x5000000; symbol = metadata_start (); symbol address = 0x5000000; addend = 0x0; address = 0x5000000; in = foo_function ` line. Maybe it was skipped, e.g. the section was not determine. Could you show the section entry for __metadata_start using readelf?
Thanks @yota9, this is really helpful, enabled debug log give us this:
BOLT-DEBUG: Relocation: offset = 0x4322150; type = R_AARCH64_ADR_PREL_PG_HI21; value = 0x5000000; symbol = __metadata_start (); symbol address = 0x5000000; addend = 0x0; address = 0x5000000; in = foo_function
Relocation: offset = 0x4322150; type = R_AARCH64_ADR_PREL_PG_HI21; value = 0x5000000; symbol = __metadata_start (); symbol address = 0x5000000; addend = 0x0; address = 0x5000000; in = foo_function
BOLT-DEBUG: cannot determine referenced section.
BOLT-DEBUG: Relocation: offset = 0x4322154; type = R_AARCH64_ADD_ABS_LO12_NC; value = 0x0; symbol = __metadata_start (); symbol address = 0x5000000; addend = 0x0; address = 0x5000000; in = foo_function
Relocation: offset = 0x4322154; type = R_AARCH64_ADD_ABS_LO12_NC; value = 0x0; symbol = __metadata_start (); symbol address = 0x5000000; addend = 0x0; address = 0x5000000; in = foo_function
BOLT-DEBUG: cannot determine referenced section.
so we know relocs is skipped at https://github.com/llvm/llvm-project/blob/main/bolt/lib/Rewrite/RewriteInstance.cpp#L2670
also __metadata_start
is not a section, but a symbol (example from original binary)
28230: 0000000005000000 0 NOTYPE GLOBAL DEFAULT ABS __metadata_start
are you suggesting it is expected for bolt to change its value after bolt parsed the original elf, then fixes it by the relocation (suppose it is not skipped in handleRelocation)?
@johndoe2012 I mean the section where this symbol __metadata_start located, is there a section for this 0x5000000 or it is in the middle of nowhere? Or it might be something really strange like non-allocatable section, basically show me the output of readelf for section and segments :) Normally such relocations are not skipped, for some strange reason it could not identify the section for this symbol and I want to understand why. Maybe in your script you located this symbol in out-of-section manner, I'm not sure.
hello @yota9, yes 0x5000000 is indeed out of section. we have the line in linker file, and have -Wl,--defsym=METADATA_BASE_ADDR=0x5000000
to supply its value
linker file line
SECTIONS
{
...
__metadata_start = METADATA_BASE_ADDR;
}
Elf file type is EXEC (Executable file)
Entry point 0x4101058
There are 4 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000010000 0x0000000004000000 0x0000000004000000
0x0000000000467e20 0x0000000000467e20 RWE 0x10000
LOAD 0x0000000000480000 0x0000000004600000 0x0000000004600000
0x0000000000000000 0x000000000024b4a0 RW 0x10000
NOTE 0x0000000000010000 0x0000000004000000 0x0000000004000000
0x0000000000000024 0x0000000000000024 R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
Section to Segment mapping:
Segment Sections...
00 .note.gnu.build-id .data .tm_clone_table .percpu .init .text .fini .rodata .config.mak.lz4 .eh_frame .init_array .fini_array .foo_info
01 .bss
02 .note.gnu.build-id
03
It is segments, you forgot about sections (readelf -S). Anyway it seems to me that it is the binary fault, not BOLTs. Previously indeed the relocation was used before this skip but it was done by accident. The binary expected to be well-formed so all sections, segments must be in-place in order BOLT would be able to work properly.
it is not in any part of the sections, so we cannot see it from readelf -S
we see your point. we are not aware of such restrictions on out of section use case for absolute symbols. in man page for elf here it mentioned specifically "...An object file does not have sections for these special indices: ... SHN_ABS This value specifies the absolute value for the corresponding reference... "
__metadata_start
is an ABS symbol. and the original binary before bolt does have the right value for this adrp instruction.
if bolt is confused by such legitimate use case, we think bolt should be fixed to handle it.
# disassembly in original binary
0000000004322150 <foo_function>:
4322150: d00066e0 adrp x0, 5000000 <METADATA_BASE_ADDR> <<<<< intended
4322150: R_AARCH64_ADR_PREL_PG_HI21 __metadata_start
4322154: 91000000 add x0, x0, #0x0
4322154: R_AARCH64_ADD_ABS_LO12_NC __metadata_start
4322158: b9400400 ldr w0, [x0, #4]
432215c: 7100001f cmp w0, #0x0
4322160: 1a9f07e0 cset w0, ne // ne = any
4322164: d65f03c0 ret
Well, AFAICT probably it should be fixed, yes :) Your contribution would be appreciated. Anyway, as initial problem is fixed I'm closing this issue. Fill free to open new one related to your problem.
Thank you very much for the help so far @yota9! It is indeed just by luck that out of section relocations were working previously before https://github.com/llvm/llvm-project/pull/100890. That commit fixes this other legitimate issue, relocation addends. We will deal with the out of section relocation issue separately.
I ran into an issue running BOLT on a program that accesses a global struct field provided by the linker. When accessing a field of that struct, the optimized binary wipes the addend and instead replaces reads of that field with reads of the struct+0.
Versions of things
I tested with both x86 and aarch64 and managed to reproduce.
Repro
This is a minimal repro on x86.
The following is the C,
print_global.c
I used the default linker script outputted by the
-Wl,-verbose
gcc flag, then inserted thebuild_id_note
at an absolute location right under that firstPROVIDE
inbuild_id.ld
, like so:This is compiled with
Then bolting:
The output of the non-bolted binary is:
Yet, the bolted binary produces the wrong result
Of course, the build id is still as expected for the bolted binary:
So this is a case where BOLT leads to incorrect runtime behavior. Inspecting the assembly, we see the root of the problem is that the immediate address for
build_id_note.hash
is missing the addend and thus we print starting from thepad
field:Non-bolted asm:
Bolted asm: