rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.55k stars 12.74k forks source link

Passing `-lto-embed-bitcode=optimized` to the `lld` gives a corrupted `.llvmbc` section. #84395

Open vext01 opened 3 years ago

vext01 commented 3 years ago

Hi everyone,

(This issue is based on this SO post)

libLTO has the option to embed the post-merged-and-optimised (i.e. final) bitcode into the end binary. This is done with the -lto-embed-bitcode=optimized option to lld.

Example of the use of this option with clang:

$ clang  -fuse-ld=lld -flto -Wl,--plugin-opt=-lto-embed-bitcode=optimized world.c
$ objcopy a.out --dump-section .llvmbc=llvm.bc
$ llvm-dis llvm.bc
$ head -5 llvm.ll
; ModuleID = 'llvm.bc'
source_filename = "ld-temp.o"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

Equivalent in the Rust world:

$ RUSTFLAGS="-C linker_plugin_lto -C linker=clang -C link-arg=-fuse-ld=lld -C link-arg=-Wl,--plugin-opt=-lto-embed-bitcode=optimized" cargo build --release
   Compiling world v0.1.0 (/tmp/world)
    Finished release [optimized] target(s) in 0.21s
$ objcopy target/release/world --dump-section .llvmbc=llvm.bc
$ llvm-dis llvm.bc
LLVM ERROR: Invalid encoding
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llvm-dis llvm.bc
 #0 0x000055ef6668578c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x1b578c)
 #1 0x000055ef666836e4 llvm::sys::RunSignalHandlers() (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x1b36e4)
 #2 0x000055ef66683843 SignalHandler(int) (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x1b3843)
 #3 0x00007fbcf1776730 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12730)
 #4 0x00007fbcf105c7bb raise /build/glibc-vjB4T1/glibc-2.28/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #5 0x00007fbcf1047535 abort /build/glibc-vjB4T1/glibc-2.28/stdlib/abort.c:81:7
 #6 0x000055ef6665a753 llvm::report_fatal_error(llvm::Twine const&, bool) (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x18a753)
 #7 0x000055ef6665a868 (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x18a868)
 #8 0x000055ef66692703 llvm::BitstreamCursor::ReadAbbrevRecord() (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x1c2703)
 #9 0x000055ef6652149d llvm::BitstreamCursor::advance(unsigned int) (.constprop.1679) (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x5149d)
#10 0x000055ef6652fabd llvm::getBitcodeFileContents(llvm::MemoryBufferRef) (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x5fabd)
#11 0x000055ef66515159 main (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x45159)
#12 0x00007fbcf104909b __libc_start_main /build/glibc-vjB4T1/glibc-2.28/csu/../csu/libc-start.c:342:3
#13 0x000055ef6651a6ea _start (/home/vext01/research/yorick/llvm-project/inst/bin/llvm-dis+0x4a6ea)
Aborted

If I search the dumped section for the bitcode magic header bytes, 0x4243c0de, there are lots of hits. If I add -C codegen-units=1 to RUSTFLAGS then there are then fewer hits (exactly two).

It looks to me (although I can't be sure) like rustc is invoking the linker in such a way that the .llvmbc sections of the intermediate objects are not being discarded post-LTO, but instead being concatenated back to back (as linkers do) with the .llvmbc section of the post-LTO bitcode. So .llvmbc would contain many modules, but it should contain only the post-LTO bitcode module.

Assuming this is the case, this is problematic: it's not trivial to split the concatenated bitstreams apart (the magic bytes cannot be used as a reliable delimiter, as those bytes may appear in other unrelated contexts), and even if we could, we wouldn't know which of the resulting modules was the post-LTO one.

vext01 commented 3 years ago

By the way, I'm happy to fix this and raise a PR if we can figure out how to fix it :)

vext01 commented 3 years ago

Looked into this a little more this morning.

Here if the bytes are indeed LLVM bitcode boundaries, then we have 4 modules in the .llvmbc section:

$ r2 world.map
 -- Can you stand on your head?
[0x00000000]> /x 4243c0de
Searching 4 bytes in [0x0-0x2e34]
hits: 4
0x00000000 hit0_0 4243c0de
0x00000cec hit0_1 4243c0de
0x00001678 hit0_2 4243c0de
0x00002538 hit0_3 4243c0de

I used dd to get these four parts of the dumped section into different files, e.g.:

$ dd if=world.map of=world.map.3 bs=1 skip=5752 count=3776

Then used llvm-dis on the four resulting files. They all succeeded to disassemble.

So this confirms that rustc is encoding many modules of bitcode into the binary.

$ ag source_filename *.ll
world.map.1.ll
2:source_filename = "world.8tv0bkhe-cgu.0"

world.map.2.ll
2:source_filename = "world.8tv0bkhe-cgu.1"

world.map.3.ll
2:source_filename = "world.8tv0bkhe-cgu.2"

world.map.4.ll
2:source_filename = "566azmeytlkxgdp0"

My guess is that world.map.4.ll is the post-lto bitcode. The others are intermediate bitcode that shouldn't be there.

Here are the link args rustc used:

$ RUSTFLAGS="-Z print-link-args -C linker_plugin_lto -C linker=clang \
    -C link_arg=-fuse-ld=lld -C link-arg=-Wl,--plugin-opt=-lto-embed-bitcode=optimized" \
    cargo build --release 
   Compiling world v0.1.0 (/tmp/world)
"clang" "-m64" "-Wl,--eh-frame-hdr" "-Wl,-znoexecstack" "-Wl,--as-needed" \
    "-Wl,-plugin-opt=O3" "-Wl,-plugin-opt=mcpu=x86-64" \
    "-L" "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib" \
    "/tmp/world/target/release/deps/world-7ae10c96f87cede3.world.8tv0bkhe-cgu.0.rcgu.o" \
    "/tmp/world/target/release/deps/world-7ae10c96f87cede3.world.8tv0bkhe-cgu.1.rcgu.o" \
    "/tmp/world/target/release/deps/world-7ae10c96f87cede3.world.8tv0bkhe-cgu.2.rcgu.o" \
    "-o" "/tmp/world/target/release/deps/world-7ae10c96f87cede3" \
    "/tmp/world/target/release/deps/world-7ae10c96f87cede3.566azmeytlkxgdp0.rcgu.o" \
    "-Wl,--gc-sections" "-pie" "-Wl,-zrelro" "-Wl,-znow" "-Wl,-O1" "-nodefaultlibs" \
    "-L" "/tmp/world/target/release/deps" \
    "-L" "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib" \
    "-Wl,--start-group" "-Wl,-Bstatic" \
    "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-2a50117481c8f2aa.rlib" \
    "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libpanic_unwind-dbea9235d0389335.rlib" \
    "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libminiz_oxide-f8fc3a1fd01a99fc.rlib" \
    "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libadler-60fdb364b9bcdfb1.rlib" \
    "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libobject-b72528d6aa948810.rlib" \
    "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libaddr2line-17ceec21e62ba944.rlib" \
    "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libgimli-a618f40af8a64e78.rlib" \
    "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd_detect-4f1f1a8ea88df8ed.rlib" \
    "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_demangle-407e9bbfdf8e96b6.rlib" \
    "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libhashbrown-a791cad3fe2b88d2.rlib" \
    "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_std_workspace_alloc-f9c8522e7861970c.rlib" \
    "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libunwind-6ea30e7b99c281a2.rlib" \
    "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcfg_if-dbd65dd9774f2a51.rlib" \
    "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/liblibc-dcb2f8ac1eb14dfb.rlib" \
    "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/liballoc-419b1f5927c75ef4.rlib"\
     "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_std_workspace_core-49204354bdca6a99.rlib" "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcore-89b26516c417e255.rlib" \
    "-Wl,--end-group" "/home/vext01/research/yorick/ykrustc/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-891aaf048eccfa9f.rlib" \
    "-Wl,-Bdynamic" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-fuse-ld=lld" \
    "-Wl,--plugin-opt=-lto-embed-bitcode=optimized"

Notice this part:

    "/tmp/world/target/release/deps/world-7ae10c96f87cede3.world.8tv0bkhe-cgu.0.rcgu.o" \
    "/tmp/world/target/release/deps/world-7ae10c96f87cede3.world.8tv0bkhe-cgu.1.rcgu.o" \
    "/tmp/world/target/release/deps/world-7ae10c96f87cede3.world.8tv0bkhe-cgu.2.rcgu.o" \

The three amigos...

It's still not clear to me why their bitcode is retained.

bjorn3 commented 3 years ago

I got it locally setup using:

$ ln -s $(rustc +nightly --print sysroot)/lib/rustlib/x86_64-unknown-linux-gnu/bin/rust-lld ld.lld
$ echo 'fn main() {}' | RUSTC_LOG=info rustc +nightly - -C linker_plugin_lto -C linker=clang -C link-arg=-B. -C link-arg=-fuse-ld=lld -C link-arg=-Wl,--plugin-opt=-lto-embed-bitcode=optimized

I tried passing -Clink-arg=-flto=full, but that didn't have any effect.

Looking at https://reviews.llvm.org/D68213 which implemented this if I read it correctly it seems to be the case that EmitBitcodeSection is called before the importer pass of LTO. To be precise at line 338 of LTOBackend.cpp.

vext01 commented 3 years ago

@bjorn3 In the above do you get a "multi-module" bitcode section too?

if I read it correctly it seems to be the case that EmitBitcodeSection is called before the importer pass of LTO.

I've been pouring over this code recently too (but it's changed a little since the review you linked).

The linker plugin supports three embedding modes: DoNotEmbed, EmbedOptimized and EmbedPostMergePreOptimized.

ForEmbedOptimized the bitcode is embedded in codegen() which is called after opt() which is where all of the LTO stuff happens. That's exactly what I want: bitcode which is as faithful to the end binary as possible.

bjorn3 commented 3 years ago

In the above do you get a "multi-module" bitcode section too?

Yes

ForEmbedOptimized the bitcode is embedded in codegen() which is called after opt() which is where all of the LTO stuff happens. That's exactly what I want: bitcode which is as faithful to the end binary as possible.

In codegen() after EmitBitcodeSection there is the following code:

FunctionImporter Importer(CombinedIndex, ModuleLoader);
  if (Error Err = Importer.importFunctions(Mod, ImportList).takeError())
    return Err;

Or at least there was as of https://reviews.llvm.org/D68213.

vext01 commented 3 years ago

It's moved around a little, but it looks like the ordering is:

I also just checked what building C code with different compilation units does. There is still a single module in the output binary's bitcode:

$ clang -flto -O3 -c f.c
$ clang -flto -O3 -c world.c
$ clang -fuse-ld=lld -flto -Wl,--plugin-opt=-lto-embed-bitcode=optimized world.o f.o -O3 -g -o world
$ objcopy world --dump-section .llvmbc=bc.bc
$ llvm-dis bc.bc
$ 

So we don't see 3 modules here, only the one for the post-LTO bitcode. Good.

So why is Rust keeping the intermediate bitcodes? I wonder if it is adding section flags that cause them to be retained or something like that?

vext01 commented 3 years ago

My guess is that world.map.4.ll is the post-lto bitcode. The others are intermediate bitcode that shouldn't be there.

Just to add, this may not be correct, since world.map.4.ll doesn't contain a main().

$ ag main world.map.*.ll
world.map.1.ll
14:define internal void @_ZN5world4main17he9131e46183fafc1E() unnamed_addr #0 {
43:define dso_local i32 @main(i32 %0, i8** %1) unnamed_addr #2 {
49:  store void ()* @_ZN5world4main17he9131e46183fafc1E, void ()** %6, align 8
bjorn3 commented 3 years ago

Can you try to put a function in one module and then call it from another module. This function should be inlined by LTO. If on of the the world.map.*.ll files corresponding to an input cgu still contains the function, it is pre-lto llvm-ir. Otherwise it is post-thin-lto llvm-ir.

bjorn3 commented 3 years ago

world.map.4.ll may be the allocator shim. Does it contain functions of form __rust_*?

vext01 commented 3 years ago

Looks like you may be right:

; ModuleID = 'world.map.4'                                                                                                                                                                                          
source_filename = "566azmeytlkxgdp0"                                                                                                                                                                                
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"                                                                                                                        
target triple = "x86_64-unknown-linux-gnu"                                                                                                                                                                          

; Function Attrs: uwtable                                                                                                                                                                                           
define dso_local i8* @__rust_alloc(i64 %0, i64 %1) local_unnamed_addr #0 {                                                                                                                                          
  %3 = tail call i8* @__rdl_alloc(i64 %0, i64 %1)                                                                                                                                                                   
  ret i8* %3                                                                                                                                                                                                        
}                                                                                                                                                                                                                   

declare hidden i8* @__rdl_alloc(i64, i64) local_unnamed_addr                                                                                                                                                        

; Function Attrs: uwtable                                                                                                                                                                                           
define dso_local void @__rust_dealloc(i8* %0, i64 %1, i64 %2) local_unnamed_addr #0 {                                                                                                                               
  tail call void @__rdl_dealloc(i8* %0, i64 %1, i64 %2)                                                                                                                                                             
  ret void                                                                                                                                                                                                          
}                                                                                                                                                                                                                   

declare hidden void @__rdl_dealloc(i8*, i64, i64) local_unnamed_addr                                                                                                                                                

; Function Attrs: uwtable                                                                                                                                                                                           
define dso_local i8* @__rust_realloc(i8* %0, i64 %1, i64 %2, i64 %3) local_unnamed_addr #0 {                                                                                                                        
  %5 = tail call i8* @__rdl_realloc(i8* %0, i64 %1, i64 %2, i64 %3)                                                                                                                                                 
  ret i8* %5                                                                                                                                                                                                        
}   
...                                                                        
bjorn3 commented 3 years ago

I looked at the llvm ir of all the extracted bitcode files and it seems to be completely unoptimized.

vext01 commented 3 years ago

Can you try to put a function in one module and then call it from another module. This function should be inlined by LTO. If on of the the world.map.*.ll files corresponding to an input cgu still contains the function, they likely contain pre-lto llvm-ir.

I think you are also right about this. It looks like we are not seeing any post-LTO bitcode.

$ cat src/main.rs 
mod other;

fn main() {
    println!("Hello, world!");
    other::f();
}
$ cat src/other.rs 
pub fn f() {
    println!("other module");
}
$ cargo clean
$ RUSTFLAGS="-Z print-link-args -C linker_plugin_lto -C linker=clang -C link_arg=-fuse-ld=lld -C link-arg=-Wl,--plugin-opt=-lto-embed-bitcode=optimized -C codegen-units=2" cargo build --release
... extract the modules into files...
 ag 'Hello' *.ll
world.1.ll
8:@anon.fad58de7366495db4650cfefac2fcd61.0 = private unnamed_addr constant <{ [14 x i8] }> <{ [14 x i8] c"Hello, world!\0A" }>, align 1
$ ag 'other' *.ll
world.2.ll
11:@anon.6a27adbd289b8ad274e2ad25e8b2371f.1.llvm.8309556541991907897 = hidden unnamed_addr constant <{ [13 x i8] }> <{ [13 x i8] c"other module\0A" }>, align 1
61:define hidden void @_ZN5world5other1f17h4cc9722557ce956eE() unnamed_addr #1 {

The definition of the "other" sting and it's pointer is here in world.2.ll:

@anon.6a27adbd289b8ad274e2ad25e8b2371f.1.llvm.8309556541991907897 = hidden unnamed_addr constant <{ [13 x i8] }> <{ [13 x i8] c"other module\0A" }>, align 1                                                        
@anon.6a27adbd289b8ad274e2ad25e8b2371f.2.llvm.8309556541991907897 = hidden unnamed_addr constant <{ i8*, [8 x i8] }> <{ i8* getelementptr inbounds (<{ [13 x i8] }>, <{ [13 x i8] }>* @anon.6a27adbd289b8ad274e2ad25e8b2371f.1.llvm.8309556541991907897, i32 0, i32 0, i32 0), [8 x i8] c"\0D\00\00\00\00\00\00\00" }>, align 8            

And its use is in a function not main (so f):

define hidden void @_ZN5world5other1f17h4cc9722557ce956eE() unnamed_addr #1 {
    ...
    store [0 x { [0 x i8]*, i64 }]* bitcast (<{ i8*, [8 x i8] }>* @anon.6a27adbd289b8ad274e2ad25e8b2371f.2.llvm.8309556541991907897 to [0 x { [0 x i8]*, i64 }]*), [0 x { [0 x i8]*, i64 }]** %3, align 8, !alias.scope !8
    ...

So the linker plugin isn't doing anything?! Surely not?

bjorn3 commented 3 years ago

I think the difference between clang and rustc is that rustc directly puts llvm bitcode into the "object" files while clang believe wraps them in elf files.

vext01 commented 3 years ago

I don't think that's true. It's one of the things I investigated earlier:

$ clang -flto -c world.c
$ file world.o
world.o: LLVM IR bitcode
$ r2 world.o
 -- Remember that word: C H A I R
[0x00000000]> /x 4243c0de
Searching 4 bytes in [0x0-0xc08]
hits: 1
0x00000000 hit0_0 4243c0de
vext01 commented 3 years ago

Well, I'm baffled. I put prints all over libLTO (and an abort, as rustc consumes stderr otherwise):

static void codegen(const Config &Conf, TargetMachine *TM,                                                                                                                                                          
                    AddStreamFn AddStream, unsigned Task, Module &Mod,                                                                                                                                              
                    const ModuleSummaryIndex &CombinedIndex) {                                                                                                                                                      
  errs() << "FFFFFFFFFFFFFFFFFFFFFFFF\n";                                                                                                                                                                           
  if (Conf.PreCodeGenModuleHook && !Conf.PreCodeGenModuleHook(Task, Mod))                                                                                                                                           
    return;                                                                                                                                                                                                         

  errs() << "YYYYYYYYYYYYYYYYYYYY\n";                                                                                                                                                                               
  if (EmbedBitcode == LTOBitcodeEmbedding::EmbedOptimized) {                                                                                                                                                        
    errs() << "XXXXXXXXXXXXXXXXXXXX\n";                                                                                                                                                                             
    llvm::EmbedBitcodeInModule(Mod, llvm::MemoryBufferRef(),                                                                                                                                                        
                               /*EmbedBitcode*/ true,                                                                                                                                                               
                               /*EmbedCmdline*/ false,                                                                                                                                                              
                               /*CmdArgs*/ std::vector<uint8_t>());                                                                                                                                                 
  }                                                                                                                                                                                                                 
  abort(); 

And we get:

          FFFFFFFFFFFFFFFFFFFFFFFF                                                                                                                                                                                 
          YYYYYYYYYYYYYYYYYYYY                                                                                                                                                                                     
          XXXXXXXXXXXXXXXXXXXX                                                                                                                                                                                     

Which tells us that EmbedBitcode == LTOBitcodeEmbedding::EmbedOptimized...

It occurs to me that the std rlibs have to be "hybrid" files: executable code, but with a .llvmbc section (as opposed to a pure bitcode file). Because std has to be prepared to build with and without LTO (we don't know what the user will request ahead of time).

It might have made sense if the extra modules we were seeing were just the .llvmbc sections from rlib deps, but as we've seen before, inlining isn't happening on the local crate even. LTO isn't happening as far as I can see.

So all in all, I'm confused.

vext01 commented 3 years ago

And looking at the binary, f() was inlined it seems:

[0x00045900 [xAdvc]0 17% 225 target/release/world]> pd $r @ sym.world::main::he9131e46183fafc1                                                                                                                      
            ;-- world::main::he9131e46183fafc1:                                                                                                                                                                     
            0x00045900      53             push rbx                    ; main.rs:3 fn main() {    ; world::main::he9131e46183fafc1                                                                                  
            0x00045901      4883ec30       sub rsp, 0x30                                                                                                                                                            
            0x00045905      488d054c3500.  lea rax, [0x00048e58]       ; mod.rs:316         Arguments { pieces, fmt: None, args }                                                                                   
            0x0004590c      48890424       mov qword [rsp], rax                                                                                                                                                     
            0x00045910      48c744240801.  mov qword [rsp + 8], 1                                                                                                                                                   
            0x00045919      48c744241000.  mov qword [rsp + 0x10], 0                                                                                                                                                
            0x00045922      488d057f61fc.  lea rax, obj.anon.75e17c1ffad1640085e148809e4cb2ae.1.llvm.9665257743853010144    ; 0xbaa8 ; "other module\n"                                                             
            0x00045929      4889442420     mov qword [rsp + 0x20], rax                                                                                                                                              
            0x0004592e      48c744242800.  mov qword [rsp + 0x28], 0                                                                                                                                                
            0x00045937      488d1d52c1fe.  lea rbx, sym.std::io::stdio::_print::h07b709dab9341524 ; main.rs:4     println!("Hello, world!");    ; 0x31a90 ; "UAWAVAUATSH\x81\xec\xb8"                               
            0x0004593e      4889e7         mov rdi, rsp                                                                                                                                                             
            0x00045941      ffd3           call rbx                                                                                                                                                                 
            0x00045943      488d054e3500.  lea rax, obj.anon.75e17c1ffad1640085e148809e4cb2ae.2.llvm.9665257743853010144 ; mod.rs:316         Arguments { pieces, fmt: None, args }    ; 0x48e98                    
            0x0004594a      48890424       mov qword [rsp], rax                                                                                                                                                     
            0x0004594e      48c744240801.  mov qword [rsp + 8], 1                                                                                                                                                   
            0x00045957      48c744241000.  mov qword [rsp + 0x10], 0                                                                                                                                                
            0x00045960      488d055161fc.  lea rax, obj.__rustc_debug_gdb_scripts_section    ; obj.anon.75e17c1ffad1640085e148809e4cb2ae.3.llvm.9665257743853010144                                                 
                                                                       ; 0xbab8                                                                                                                                     
            0x00045967      4889442420     mov qword [rsp + 0x20], rax                                                                                                                                              
            0x0004596c      48c744242800.  mov qword [rsp + 0x28], 0                                                                                                                                                
            0x00045975      4889e7         mov rdi, rsp                                                                                                                                                             
            0x00045978      ffd3           call rbx                    ; other.rs:2     println!("other module");                                                                                                   
            0x0004597a      4883c430       add rsp, 0x30               ; main.rs:6 }                                                                                                                                
            0x0004597e      5b             pop rbx                                                                                                                                                                  
            0x0004597f      c3             ret
tpisto commented 3 years ago

Hi! Any update about this issue?

darkness-ai commented 2 years ago

Hi,

Encountered the same issue and looked into it, found out a possible change to rustc that will allow "-lto-embed-bitcode=optimized" to work.

First of all the reason it doesn't work is that rustc generate bitcode modules for thinLTO, which in turn ld.lld recognize and passes via the thinLTO backend. The thinLTO backend calls the "codegen" function in libLTO, per thinLTO module, and in general there is no final merged bitcode module to embed in the final binary.

Basically, "lto-embed-bitcode" doesn't work when working with thinLTO, which is the default and only way rustc works with modules when passing them to Clinker-plugin-lto. Drilled down it seems that in "compiler/rustc_codegen_llvm/src/back/write.rs" in the function codegen, before emitting out the bitcode module, rustc adds thinLTO info.

if config.bitcode_needed() {
            let _timer = cgcx
                .prof
                .generic_activity_with_arg("LLVM_module_codegen_make_bitcode", &*module.name);
            let thin = ThinBuffer::new(llmod); // <=== This line turns the BC module into thinLTO one

Where ThinBuffer is actually, running the following llvm pass

PM.add(createWriteThinLTOBitcodePass(OS));

Changing (only for call site in write.rs:codegen) to go through "createBitcodeWriterPass(OS)" pass instead generated bitcode modules without thinLTO info. Allowing the final link stage to use regular LTO, and now "lto-embed-bitcode=optimized" works.

I suggest to add a config option to rustc to select which pass to run. Which would need to be used for "lto-embed-bitcode=optimized" Just passing a boolean flag to ThinBuffer::new(llmod, target_thin_lto), and selecting the correct pass in the llvm wrapper fixes the issue.