llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.03k stars 11.57k forks source link

clang -flto=thin incompatible with -Wl,-plugin-opt=-lto-embed-bit=optimized #86946

Open chibinz opened 5 months ago

chibinz commented 5 months ago

Hi,

I was trying to obtain whole program bitcode through lto but run into some issues. Here's the step to reproduce.

Running

clang -c -flto=thin hello.c
clang -c -flto=thin foo.c
clang -flto=thin -fuse-ld=lld -Wl,-plugin-opt=-lto-embed-bitcode=optimized hello.o foo.o -o hello
objcopy --dump-section .llvmbc=hello.bc hello
clang hello.bc

gives the following error

error: Invalid encoding
1 error generated.

This issue does not occur when using -flto=full. The rust compiler also reports similar issues here: https://github.com/rust-lang/rust/issues/84395

hello.c

#include <stdio.h>

extern void foo();

int main() {
    printf("Hello, world!\n");
    foo();
    return 0;
}

foo.c

#include <stdio.h>

void foo() {
    printf("foo!\n");
}

clang version:

clang version 16.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
EugeneZelenko commented 5 months ago

@teresajohnson: Please take a look.

dtcxzyw commented 5 months ago

The section .llvmbc contains multiple bc files. You can split it by the magic bytes 0x42, 0x43, 0xc0, 0xde :) I guess this script may help: https://github.com/dtcxzyw/llvm-ci/blob/6285e4cfb907be613f1bb8dd2323aa1056819d58/binutils.py#L51-L79

dtcxzyw commented 5 months ago

See also https://github.com/rust-lang/rust/issues/84395#issuecomment-827543904

teresajohnson commented 5 months ago

@mtrofin implemented this support and is the best person to take a look.

I see in the test case added for this support that it is only tested with distributed ThinLTO: https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGen/thinlto_embed_bitcode.ll. It's possible that this is not (yet) supported for in process ThinLTO, but I will let @mtrofin clarify.

Backing up a minute - what are you trying to do? The initial issue description says you are trying to obtain the whole program bitcode, but that is not something that is even possible with ThinLTO, which unlike regular LTO does not merge all IR. If you want to get the bitcode after each stage of the ThinLTO backend you can also try using the --save-temps lld option. E.g. see usage in https://github.com/llvm/llvm-project/blob/main/lld/test/ELF/lto/save-temps-eq.ll

chibinz commented 5 months ago

Hi @dtcxzyw @teresajohnson,

Thanks for the info!

"Whole program bitcode" might be a bit confusing, what I meant is a bitcode file that can be assembled into the original executable (where it was extracted from).

What I need is executables with different sanitizer instrumentations (asan, ubsan, memsan, sancov and potentially other custom instrumentation passes). Doing this post-build allows me to compile ONLY once and instrument as many times as I want. This will save compilation time, and bypass many of the sanitizer incompatibilities in the build process.

-flto works fine for my purpose. Just that it's too slow for some of the larger binaries.

mtrofin commented 5 months ago

IIUC you're using the -lto-embed-bitcode=optimized. I did not implement that. I implemented the -lto-embed-bitcode=post-link-pre-opt version, but had to change what originally was a boolean (-lto-embed-bitcode) to an enum.

(I'm trying to find the github accounts to mention for the authors of the original patch... and failing. See D87477 for the chat on my original change and D68213 for the original change which has the behavior you're trying to use.)