Open vext01 opened 10 months ago
Just to add: I believe the system is not quite right even with PIE off. By adding SHF_ALLOC
to the special sections we are opting to have them relocated meaning that the symbol addresses inside become virtual addresses, but we assume they are file offsets. It works because without PIE the relocation step happens to be a NOP.
Some more on the PIE saga.
After removing SHF_ALLOC
from the bitcode and basic block map sections, and having that data loaded from disk instead, the mapper fails when PIE is enabled. This is because some symbols become undefined when:
-fPIE
and -pie
flags if clang was configured with -DCLANG_DEFAULT_PIE_ON_LINUX=OFF
)-Wl,--mllvm=--yk-insert-stackmaps
)To demonstrate, take a hello world C program in a.c
:
$ clang -fno-PIE -nopie -flto -fuse-ld=lld -Wl,--mllvm=--yk-insert-stackmaps a.c && nm a.out | grep main
00000000002016a0 T main
$ clang -fPIE -pie -flto -fuse-ld=lld -Wl,--mllvm=--yk-insert-stackmaps a.c && nm a.out | grep main
U main
$ clang -fPIE -pie -flto -fuse-ld=lld a.c && nm a.out | grep main
0000000000001700 T main
Having undefined symbols causes dladdr()
(and probably dlsym()
) to fail to find symbols, meaning that the mapper erroneously reports some blocks as unmappable.
After some fiddling, I've narrowed the issue to the .llvm_stackmaps
section being SHF_ALLOC
(this time this isn't our addition: it's marked SHF_ALLOC
in upstream llvm).
An obvious next step to remove SHF_ALLOC
from the stackmap section:
diff --git a/llvm/lib/MC/MCObjectFileInfo.cpp b/llvm/lib/MC/MCObjectFileInfo.cpp
index f9a67105820f..1155b62aea9c 100644
--- a/llvm/lib/MC/MCObjectFileInfo.cpp
+++ b/llvm/lib/MC/MCObjectFileInfo.cpp
@@ -522,7 +522,7 @@ void MCObjectFileInfo::initELFMCObjectFileInfo(const Triple &T, bool Large) {
Ctx->getELFSection(".debug_tu_index", DebugSecType, 0);
StackMapSection =
- Ctx->getELFSection(".llvm_stackmaps", ELF::SHT_PROGBITS, ELF::SHF_ALLOC);
+ Ctx->getELFSection(".llvm_stackmaps", ELF::SHT_PROGBITS, 0);
FaultMapSection =
Ctx->getELFSection(".llvm_faultmaps", ELF::SHT_PROGBITS, ELF::SHF_ALLOC);
If you do this then all symbols are accounted for, and dladdr()
can find them:
$ clang -fPIE -pie -flto -fuse-ld=lld -Wl,--mllvm=--yk-insert-stackmaps a.c && nm a.out | grep main
0000000000001700 T main
But this causes issues further down the line. By marking the stackmap section as unloadable, when LLVM generates code at runtime and loads the resulting shared objects, they are devoid of stackmap sections. Of course they are: we asked for them not to be loaded...
We need those for deopt...
Maybe we can only add SHF_ALLOC
to the JITted modules. The thing is, I can't see any way to determine if we are JITting in MCObjectFileInfo::initELFMCObjectFileInfo()
.
lhames in llvm discord:
My gut reaction is that this is a compiler bug -- I can't see why having SHF_ALLOC on the stackmap section should cause other symbols to be left out of the dynamic symbol table.
To move this forward we should try to reproduce this with upstream LLVM. Perhaps we can compile a .ll
file containing stackmap calls to avoid requiring our custom stackmap pass.
Raised a bug upstream for this: https://github.com/llvm/llvm-project/issues/75074
Problem Description
PIE has been disabled by default in the JIT for some time.
The reason for this is if you enable PIE you will get an error like:
In other words, the basic block map section doesn't link.
You would not get this with upstream LLVM if you enabled the basic block section. The reason we see it is that in ykllvm, we have added the
SHF_ALLOC
flag to avoid reading the ELF file at runtime: when you add theSHF_ALLOC
flag the loader will load the section as the process image is bootstrapped.But there's another effect of
SHF_ALLOC
that we haven't considered before: symbol values emitted with LLVM'sMCStreamer::emitSymbolValue()
will get relocated!The error above is the loader saying (in it's own way) that it cannot relocate a symbol in the basic block section because to do so would require the section to be writable. It seems relocating with PIE requires the section to be patched at load time.
This can be verified by enabling PIE by default and applying the following patch to ykllvm:
(Note that the same problem seems to exist for the embedded bitcode section)
This will mean that tests link once again, but something breaks in the PT decoder and the mapper which expect to be reading ELF file offsets out of the basic block sections, but which in fact is being handed virtual addresses due to
SHF_ALLOC
(Those parts of the system use conversion utilities inykaddr
to convert from/to offsets and virtual addresses, but if youSHF_ALLOC
the section, everything is a virtual address, so the conversions become unsound).It works now without PIE because without PIE a file offset for the main object and a virtual address for the main object are the same. But as soon as you enable PIE, the main object gets relocated.
How to fix?
The truth of the matter is we've added
SHF_ALLOC
to sections that were never designed to be loaded.There are two ways to move forward:
SHF_ALLOC
sections that were not designed to be loaded.SHF_ALLOC
the sections, and adjust the system to consume virtual addresses from them, not file offsets.The former is arguably more correct and easier to fix, but maybe a performance hit, because if you don't load the section, you have to open the ELF binary to find the section. But isn't that all the loader is doing anyway?
The latter diverges from upstream's intent of the sections and is more "uncharted territory".
Reading list