Open vext01 opened 3 years ago
By the way, I'm happy to fix this and raise a PR if we can figure out how to fix it :)
Looked into this a little more this morning.
Here if the bytes are indeed LLVM bitcode boundaries, then we have 4 modules in the .llvmbc
section:
$ r2 world.map
-- Can you stand on your head?
[0x00000000]> /x 4243c0de
Searching 4 bytes in [0x0-0x2e34]
hits: 4
0x00000000 hit0_0 4243c0de
0x00000cec hit0_1 4243c0de
0x00001678 hit0_2 4243c0de
0x00002538 hit0_3 4243c0de
I used dd
to get these four parts of the dumped section into different files, e.g.:
$ dd if=world.map of=world.map.3 bs=1 skip=5752 count=3776
Then used llvm-dis
on the four resulting files. They all succeeded to disassemble.
So this confirms that rustc is encoding many modules of bitcode into the binary.
$ ag source_filename *.ll
world.map.1.ll
2:source_filename = "world.8tv0bkhe-cgu.0"
world.map.2.ll
2:source_filename = "world.8tv0bkhe-cgu.1"
world.map.3.ll
2:source_filename = "world.8tv0bkhe-cgu.2"
world.map.4.ll
2:source_filename = "566azmeytlkxgdp0"
My guess is that world.map.4.ll
is the post-lto bitcode. The others are intermediate bitcode that shouldn't be there.
Here are the link args rustc used:
$ RUSTFLAGS="-Z print-link-args -C linker_plugin_lto -C linker=clang \
-C link_arg=-fuse-ld=lld -C link-arg=-Wl,--plugin-opt=-lto-embed-bitcode=optimized" \
cargo build --release
Compiling world v0.1.0 (/tmp/world)
"clang" "-m64" "-Wl,--eh-frame-hdr" "-Wl,-znoexecstack" "-Wl,--as-needed" \
"-Wl,-plugin-opt=O3" "-Wl,-plugin-opt=mcpu=x86-64" \
"-L" "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib" \
"/tmp/world/target/release/deps/world-7ae10c96f87cede3.world.8tv0bkhe-cgu.0.rcgu.o" \
"/tmp/world/target/release/deps/world-7ae10c96f87cede3.world.8tv0bkhe-cgu.1.rcgu.o" \
"/tmp/world/target/release/deps/world-7ae10c96f87cede3.world.8tv0bkhe-cgu.2.rcgu.o" \
"-o" "/tmp/world/target/release/deps/world-7ae10c96f87cede3" \
"/tmp/world/target/release/deps/world-7ae10c96f87cede3.566azmeytlkxgdp0.rcgu.o" \
"-Wl,--gc-sections" "-pie" "-Wl,-zrelro" "-Wl,-znow" "-Wl,-O1" "-nodefaultlibs" \
"-L" "/tmp/world/target/release/deps" \
"-L" "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib" \
"-Wl,--start-group" "-Wl,-Bstatic" \
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-2a50117481c8f2aa.rlib" \
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libpanic_unwind-dbea9235d0389335.rlib" \
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libminiz_oxide-f8fc3a1fd01a99fc.rlib" \
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libadler-60fdb364b9bcdfb1.rlib" \
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libobject-b72528d6aa948810.rlib" \
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libaddr2line-17ceec21e62ba944.rlib" \
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libgimli-a618f40af8a64e78.rlib" \
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd_detect-4f1f1a8ea88df8ed.rlib" \
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_demangle-407e9bbfdf8e96b6.rlib" \
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libhashbrown-a791cad3fe2b88d2.rlib" \
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_std_workspace_alloc-f9c8522e7861970c.rlib" \
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libunwind-6ea30e7b99c281a2.rlib" \
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcfg_if-dbd65dd9774f2a51.rlib" \
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/liblibc-dcb2f8ac1eb14dfb.rlib" \
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/liballoc-419b1f5927c75ef4.rlib"\
"/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_std_workspace_core-49204354bdca6a99.rlib" "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcore-89b26516c417e255.rlib" \
"-Wl,--end-group" "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-891aaf048eccfa9f.rlib" \
"-Wl,-Bdynamic" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-fuse-ld=lld" \
"-Wl,--plugin-opt=-lto-embed-bitcode=optimized"
Notice this part:
"/tmp/world/target/release/deps/world-7ae10c96f87cede3.world.8tv0bkhe-cgu.0.rcgu.o" \
"/tmp/world/target/release/deps/world-7ae10c96f87cede3.world.8tv0bkhe-cgu.1.rcgu.o" \
"/tmp/world/target/release/deps/world-7ae10c96f87cede3.world.8tv0bkhe-cgu.2.rcgu.o" \
The three amigos...
It's still not clear to me why their bitcode is retained.
I got it locally setup using:
$ ln -s $(rustc +nightly --print sysroot)/lib/rustlib/x86_64-unknown-linux-gnu/bin/rust-lld ld.lld
$ echo 'fn main() {}' | RUSTC_LOG=info rustc +nightly - -C linker_plugin_lto -C linker=clang -C link-arg=-B. -C link-arg=-fuse-ld=lld -C link-arg=-Wl,--plugin-opt=-lto-embed-bitcode=optimized
I tried passing -Clink-arg=-flto=full
, but that didn't have any effect.
Looking at https://reviews.llvm.org/D68213 which implemented this if I read it correctly it seems to be the case that EmitBitcodeSection
is called before the importer pass of LTO. To be precise at line 338 of LTOBackend.cpp
.
@bjorn3 In the above do you get a "multi-module" bitcode section too?
if I read it correctly it seems to be the case that EmitBitcodeSection is called before the importer pass of LTO.
I've been pouring over this code recently too (but it's changed a little since the review you linked).
The linker plugin supports three embedding modes: DoNotEmbed
, EmbedOptimized
and EmbedPostMergePreOptimized
.
ForEmbedOptimized
the bitcode is embedded in codegen()
which is called after opt()
which is where all of the LTO stuff happens. That's exactly what I want: bitcode which is as faithful to the end binary as possible.
In the above do you get a "multi-module" bitcode section too?
Yes
ForEmbedOptimized the bitcode is embedded in codegen() which is called after opt() which is where all of the LTO stuff happens. That's exactly what I want: bitcode which is as faithful to the end binary as possible.
In codegen()
after EmitBitcodeSection
there is the following code:
FunctionImporter Importer(CombinedIndex, ModuleLoader);
if (Error Err = Importer.importFunctions(Mod, ImportList).takeError())
return Err;
Or at least there was as of https://reviews.llvm.org/D68213.
It's moved around a little, but it looks like the ordering is:
I also just checked what building C code with different compilation units does. There is still a single module in the output binary's bitcode:
$ clang -flto -O3 -c f.c
$ clang -flto -O3 -c world.c
$ clang -fuse-ld=lld -flto -Wl,--plugin-opt=-lto-embed-bitcode=optimized world.o f.o -O3 -g -o world
$ objcopy world --dump-section .llvmbc=bc.bc
$ llvm-dis bc.bc
$
So we don't see 3 modules here, only the one for the post-LTO bitcode. Good.
So why is Rust keeping the intermediate bitcodes? I wonder if it is adding section flags that cause them to be retained or something like that?
My guess is that world.map.4.ll is the post-lto bitcode. The others are intermediate bitcode that shouldn't be there.
Just to add, this may not be correct, since world.map.4.ll
doesn't contain a main()
.
$ ag main world.map.*.ll
world.map.1.ll
14:define internal void @_ZN5world4main17he9131e46183fafc1E() unnamed_addr #0 {
43:define dso_local i32 @main(i32 %0, i8** %1) unnamed_addr #2 {
49: store void ()* @_ZN5world4main17he9131e46183fafc1E, void ()** %6, align 8
Can you try to put a function in one module and then call it from another module. This function should be inlined by LTO. If on of the the world.map.*.ll
files corresponding to an input cgu still contains the function, it is pre-lto llvm-ir. Otherwise it is post-thin-lto llvm-ir.
world.map.4.ll may be the allocator shim. Does it contain functions of form __rust_*
?
Looks like you may be right:
; ModuleID = 'world.map.4'
source_filename = "566azmeytlkxgdp0"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: uwtable
define dso_local i8* @__rust_alloc(i64 %0, i64 %1) local_unnamed_addr #0 {
%3 = tail call i8* @__rdl_alloc(i64 %0, i64 %1)
ret i8* %3
}
declare hidden i8* @__rdl_alloc(i64, i64) local_unnamed_addr
; Function Attrs: uwtable
define dso_local void @__rust_dealloc(i8* %0, i64 %1, i64 %2) local_unnamed_addr #0 {
tail call void @__rdl_dealloc(i8* %0, i64 %1, i64 %2)
ret void
}
declare hidden void @__rdl_dealloc(i8*, i64, i64) local_unnamed_addr
; Function Attrs: uwtable
define dso_local i8* @__rust_realloc(i8* %0, i64 %1, i64 %2, i64 %3) local_unnamed_addr #0 {
%5 = tail call i8* @__rdl_realloc(i8* %0, i64 %1, i64 %2, i64 %3)
ret i8* %5
}
...
I looked at the llvm ir of all the extracted bitcode files and it seems to be completely unoptimized.
Can you try to put a function in one module and then call it from another module. This function should be inlined by LTO. If on of the the world.map.*.ll files corresponding to an input cgu still contains the function, they likely contain pre-lto llvm-ir.
I think you are also right about this. It looks like we are not seeing any post-LTO bitcode.
$ cat src/main.rs
mod other;
fn main() {
println!("Hello, world!");
other::f();
}
$ cat src/other.rs
pub fn f() {
println!("other module");
}
$ cargo clean
$ RUSTFLAGS="-Z print-link-args -C linker_plugin_lto -C linker=clang -C link_arg=-fuse-ld=lld -C link-arg=-Wl,--plugin-opt=-lto-embed-bitcode=optimized -C codegen-units=2" cargo build --release
... extract the modules into files...
ag 'Hello' *.ll
world.1.ll
8:@anon.fad58de7366495db4650cfefac2fcd61.0 = private unnamed_addr constant <{ [14 x i8] }> <{ [14 x i8] c"Hello, world!\0A" }>, align 1
$ ag 'other' *.ll
world.2.ll
11:@anon.6a27adbd289b8ad274e2ad25e8b2371f.1.llvm.8309556541991907897 = hidden unnamed_addr constant <{ [13 x i8] }> <{ [13 x i8] c"other module\0A" }>, align 1
61:define hidden void @_ZN5world5other1f17h4cc9722557ce956eE() unnamed_addr #1 {
The definition of the "other" sting and it's pointer is here in world.2.ll
:
@anon.6a27adbd289b8ad274e2ad25e8b2371f.1.llvm.8309556541991907897 = hidden unnamed_addr constant <{ [13 x i8] }> <{ [13 x i8] c"other module\0A" }>, align 1
@anon.6a27adbd289b8ad274e2ad25e8b2371f.2.llvm.8309556541991907897 = hidden unnamed_addr constant <{ i8*, [8 x i8] }> <{ i8* getelementptr inbounds (<{ [13 x i8] }>, <{ [13 x i8] }>* @anon.6a27adbd289b8ad274e2ad25e8b2371f.1.llvm.8309556541991907897, i32 0, i32 0, i32 0), [8 x i8] c"\0D\00\00\00\00\00\00\00" }>, align 8
And its use is in a function not main
(so f
):
define hidden void @_ZN5world5other1f17h4cc9722557ce956eE() unnamed_addr #1 {
...
store [0 x { [0 x i8]*, i64 }]* bitcast (<{ i8*, [8 x i8] }>* @anon.6a27adbd289b8ad274e2ad25e8b2371f.2.llvm.8309556541991907897 to [0 x { [0 x i8]*, i64 }]*), [0 x { [0 x i8]*, i64 }]** %3, align 8, !alias.scope !8
...
So the linker plugin isn't doing anything?! Surely not?
I think the difference between clang and rustc is that rustc directly puts llvm bitcode into the "object" files while clang believe wraps them in elf files.
I don't think that's true. It's one of the things I investigated earlier:
$ clang -flto -c world.c
$ file world.o
world.o: LLVM IR bitcode
$ r2 world.o
-- Remember that word: C H A I R
[0x00000000]> /x 4243c0de
Searching 4 bytes in [0x0-0xc08]
hits: 1
0x00000000 hit0_0 4243c0de
Well, I'm baffled. I put prints all over libLTO (and an abort, as rustc consumes stderr otherwise):
static void codegen(const Config &Conf, TargetMachine *TM,
AddStreamFn AddStream, unsigned Task, Module &Mod,
const ModuleSummaryIndex &CombinedIndex) {
errs() << "FFFFFFFFFFFFFFFFFFFFFFFF\n";
if (Conf.PreCodeGenModuleHook && !Conf.PreCodeGenModuleHook(Task, Mod))
return;
errs() << "YYYYYYYYYYYYYYYYYYYY\n";
if (EmbedBitcode == LTOBitcodeEmbedding::EmbedOptimized) {
errs() << "XXXXXXXXXXXXXXXXXXXX\n";
llvm::EmbedBitcodeInModule(Mod, llvm::MemoryBufferRef(),
/*EmbedBitcode*/ true,
/*EmbedCmdline*/ false,
/*CmdArgs*/ std::vector<uint8_t>());
}
abort();
And we get:
FFFFFFFFFFFFFFFFFFFFFFFF
YYYYYYYYYYYYYYYYYYYY
XXXXXXXXXXXXXXXXXXXX
Which tells us that EmbedBitcode == LTOBitcodeEmbedding::EmbedOptimized
...
It occurs to me that the std rlibs have to be "hybrid" files: executable code, but with a .llvmbc
section (as opposed to a pure bitcode file). Because std has to be prepared to build with and without LTO (we don't know what the user will request ahead of time).
It might have made sense if the extra modules we were seeing were just the .llvmbc
sections from rlib deps, but as we've seen before, inlining isn't happening on the local crate even. LTO isn't happening as far as I can see.
So all in all, I'm confused.
And looking at the binary, f()
was inlined it seems:
[0x00045900 [xAdvc]0 17% 225 target/release/world]> pd $r @ sym.world::main::he9131e46183fafc1
;-- world::main::he9131e46183fafc1:
0x00045900 53 push rbx ; main.rs:3 fn main() { ; world::main::he9131e46183fafc1
0x00045901 4883ec30 sub rsp, 0x30
0x00045905 488d054c3500. lea rax, [0x00048e58] ; mod.rs:316 Arguments { pieces, fmt: None, args }
0x0004590c 48890424 mov qword [rsp], rax
0x00045910 48c744240801. mov qword [rsp + 8], 1
0x00045919 48c744241000. mov qword [rsp + 0x10], 0
0x00045922 488d057f61fc. lea rax, obj.anon.75e17c1ffad1640085e148809e4cb2ae.1.llvm.9665257743853010144 ; 0xbaa8 ; "other module\n"
0x00045929 4889442420 mov qword [rsp + 0x20], rax
0x0004592e 48c744242800. mov qword [rsp + 0x28], 0
0x00045937 488d1d52c1fe. lea rbx, sym.std::io::stdio::_print::h07b709dab9341524 ; main.rs:4 println!("Hello, world!"); ; 0x31a90 ; "UAWAVAUATSH\x81\xec\xb8"
0x0004593e 4889e7 mov rdi, rsp
0x00045941 ffd3 call rbx
0x00045943 488d054e3500. lea rax, obj.anon.75e17c1ffad1640085e148809e4cb2ae.2.llvm.9665257743853010144 ; mod.rs:316 Arguments { pieces, fmt: None, args } ; 0x48e98
0x0004594a 48890424 mov qword [rsp], rax
0x0004594e 48c744240801. mov qword [rsp + 8], 1
0x00045957 48c744241000. mov qword [rsp + 0x10], 0
0x00045960 488d055161fc. lea rax, obj.__rustc_debug_gdb_scripts_section ; obj.anon.75e17c1ffad1640085e148809e4cb2ae.3.llvm.9665257743853010144
; 0xbab8
0x00045967 4889442420 mov qword [rsp + 0x20], rax
0x0004596c 48c744242800. mov qword [rsp + 0x28], 0
0x00045975 4889e7 mov rdi, rsp
0x00045978 ffd3 call rbx ; other.rs:2 println!("other module");
0x0004597a 4883c430 add rsp, 0x30 ; main.rs:6 }
0x0004597e 5b pop rbx
0x0004597f c3 ret
Hi! Any update about this issue?
Hi,
Encountered the same issue and looked into it, found out a possible change to rustc that will allow "-lto-embed-bitcode=optimized" to work.
First of all the reason it doesn't work is that rustc generate bitcode modules for thinLTO, which in turn ld.lld recognize and passes via the thinLTO backend. The thinLTO backend calls the "codegen" function in libLTO, per thinLTO module, and in general there is no final merged bitcode module to embed in the final binary.
Basically, "lto-embed-bitcode" doesn't work when working with thinLTO, which is the default and only way rustc works with modules when passing them to Clinker-plugin-lto. Drilled down it seems that in "compiler/rustc_codegen_llvm/src/back/write.rs" in the function codegen, before emitting out the bitcode module, rustc adds thinLTO info.
if config.bitcode_needed() {
let _timer = cgcx
.prof
.generic_activity_with_arg("LLVM_module_codegen_make_bitcode", &*module.name);
let thin = ThinBuffer::new(llmod); // <=== This line turns the BC module into thinLTO one
Where ThinBuffer is actually, running the following llvm pass
PM.add(createWriteThinLTOBitcodePass(OS));
Changing (only for call site in write.rs:codegen) to go through "createBitcodeWriterPass(OS)" pass instead generated bitcode modules without thinLTO info. Allowing the final link stage to use regular LTO, and now "lto-embed-bitcode=optimized" works.
I suggest to add a config option to rustc to select which pass to run. Which would need to be used for "lto-embed-bitcode=optimized" Just passing a boolean flag to ThinBuffer::new(llmod, target_thin_lto), and selecting the correct pass in the llvm wrapper fixes the issue.
Hi everyone,
(This issue is based on this SO post)
libLTO has the option to embed the post-merged-and-optimised (i.e. final) bitcode into the end binary. This is done with the
-lto-embed-bitcode=optimized
option tolld
.Example of the use of this option with clang:
Equivalent in the Rust world:
If I search the dumped section for the bitcode magic header bytes,
0x4243c0de
, there are lots of hits. If I add-C codegen-units=1
toRUSTFLAGS
then there are then fewer hits (exactly two).It looks to me (although I can't be sure) like rustc is invoking the linker in such a way that the
.llvmbc
sections of the intermediate objects are not being discarded post-LTO, but instead being concatenated back to back (as linkers do) with the.llvmbc
section of the post-LTO bitcode. So.llvmbc
would contain many modules, but it should contain only the post-LTO bitcode module.Assuming this is the case, this is problematic: it's not trivial to split the concatenated bitstreams apart (the magic bytes cannot be used as a reliable delimiter, as those bytes may appear in other unrelated contexts), and even if we could, we wouldn't know which of the resulting modules was the post-LTO one.