rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
94.88k stars 12.23k forks source link

Codegen transformation of weak symbol to undefined across CGUs affects linking semantics #120842

Open zmodem opened 4 months ago

zmodem commented 4 months ago

LLVM's code coverage instrumentation has a mode called "runtime counter relocation" (see https://clang.llvm.org/docs/SourceBasedCodeCoverage.html). The details are not important, but one effect is that it inserts a weak symbol definition into each instrumented module called __llvm_profile_counter_bias.

We can see it in action with this sample code (foo.rs):

#[no_mangle]
pub extern "C" fn foo() { bar::bar() }

mod bar {
    #[inline(never)]
    pub fn bar() {}
}
$ rustc foo.rs -Ccodegen-units=2 -Copt-level=0 --crate-type=rlib -Cembed-bitcode=no -Cinstrument-coverage -Cllvm-args=-runtime-counter-relocation && ar x libfoo.rlib && nm foo*.rcgu.o

foo.foo.730f9a7e513a85b2-cgu.0.rcgu.o:
                 U _RNvNtCs9SsW9UZxs52_3foo3bar3bar
0000000000000000 V __covrec_5CF8C24CDB18BDACu
0000000000000000 V __llvm_profile_counter_bias
0000000000000000 R __llvm_profile_filename
0000000000000000 n __profc_foo
0000000000000000 T foo

foo.foo.730f9a7e513a85b2-cgu.1.rcgu.o:
0000000000000000 T _RNvNtCs9SsW9UZxs52_3foo3bar3bar
0000000000000000 V __covrec_BC9F6BE8A43E337Cu
0000000000000000 V __llvm_profile_counter_bias
0000000000000000 R __llvm_profile_filename
0000000000000000 n __profc__RNvNtCs9SsW9UZxs52_3foo3bar3bar

Note the weak definitions (V) of __llvm_profile_counter_bias in both CGUs.

Now, if we increase the optimization level:

$ rm -f libfoo.rlib foo*.rcgu.o
$ rustc foo.rs -Ccodegen-units=2 -Copt-level=1 --crate-type=rlib -Cembed-bitcode=no -Cinstrument-coverage -Cllvm-args=-runtime-counter-relocation && ar x libfoo.rlib && nm foo*.rcgu.o

foo.foo.730f9a7e513a85b2-cgu.0.rcgu.o:
                 U _RNvNtCs9SsW9UZxs52_3foo3bar3bar
0000000000000000 V __covrec_5CF8C24CDB18BDACu
                 U __llvm_profile_counter_bias
0000000000000000 n __profc_foo
0000000000000000 T foo

foo.foo.730f9a7e513a85b2-cgu.1.rcgu.o:
0000000000000000 T _RNvNtCs9SsW9UZxs52_3foo3bar3bar
0000000000000000 V __covrec_BC9F6BE8A43E337Cu
0000000000000000 V __llvm_profile_counter_bias
0000000000000000 R __llvm_profile_filename
0000000000000000 n __profc__RNvNtCs9SsW9UZxs52_3foo3bar3bar

Note that __llvm_profile_counter_bias is now an undefined (U) symbol in the first CGU, and defined in the other.

It's not deterministic in which CGU the definition ends up in. Running the command repeatedly will show it either in 0.rcgu.o or 1.rcgu.o.

This seems to be due to an optimization figuring that since the weak definitions are all the same and will get linked together, just keeping one across the CGUs is enough. I haven't been able to find the code that does this. Does anyone know where this transformation is implemented?

That optimization seems reasonable, however the introduction of the undefined symbol can affect how a program gets linked. Consider:

$ echo 'extern void foo(); int main() { foo(); }' > main1.c
$ echo 'int main() {}' > main2.c

$ clang -c main1.c
$ clang -c main2.c -fprofile-instr-generate -fcoverage-mapping -mllvm -runtime-counter-relocation && ar r main2.a main2.o

$ clang -fuse-ld=lld main1.o main2.a libfoo.rlib
ld.lld: error: duplicate symbol: main
>>> defined at main1.c
>>>            main1.o:(main)
>>> defined at main2.c
>>>            main2.o:(.text+0x0) in archive main2.a
clang: error: linker command failed with exit code 1 (use -v to see invocation)

What's happening here is that because main1.o references foo, 0.rcgu.o gets linked in. And because that has an undefined reference to __llvm_profile_counter_bias, main2.o, which has the first definition of that symbol that the linker saw, gets linked in --- which it wouldn't have been otherwise because it's in a static library.

In other words, because rustc turned __llvm_profile_counter_bias into an undefined symbol in the first CGU, the linker linked different code than it otherwise would have. If rustc hadn't done that (e.g. with -Copt-level=0) main2.o would not have been linked in, and the link would have succeeded. If the undefined symbol had gone in the second CGU (the one which doesn't define foo), which it does sometimes, the link would also have succeeded.

In this case, the transformation caused the link to fail, but it seems possible to construct a case where it changes program behavior.

This makes me think that the transformation, while it seems valid locally when considering just the CGUs, is not valid in the wider context of linking a whole program or library.

I was using:

$ rustc --version --verbose
rustc 1.76.0 (07dca489a 2024-02-04)
binary: rustc
commit-hash: 07dca489ac2d933c78d3c5158e3f43beefeb02ce
commit-date: 2024-02-04
host: x86_64-unknown-linux-gnu
release: 1.76.0
LLVM version: 17.0.6

but I believe this also happens with the latest versions (of both rustc and llvm). We first hit this in Chromium: https://crbug.com/324126269

zmodem commented 4 months ago

+@nnethercote and @nikic in case you're familiar with this kind of thing.

nnethercote commented 4 months ago

I'm no expert on weak symbols, but I would start looking here, which is the starting point for the CGU formation. And the first 600 lines of code in that file is the heart of it, which may sound like a lot, but it's pretty well structured and commented. There is also some coverage-specific code in there, just grep for "coverage", which may be relevant.

zmodem commented 4 months ago

I believe the transformation of these weak symbols is a ThinLTO thing, "Weak Symbol Resolution". In rustc it appears to happen here:

https://github.com/rust-lang/rust/blob/bc1b9e0e9a813d27a09708b293dc2d41c472f0d0/compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp#L1305-L1309

It looks like this function plays an important part in deciding whether a CGU gets ThinLTO'd or not: https://github.com/rust-lang/rust/blob/bc1b9e0e9a813d27a09708b293dc2d41c472f0d0/compiler/rustc_codegen_ssa/src/back/write.rs#L796

I'll keep digging, but now I'm wondering:

bjorn3 commented 4 months ago

Does rustc need to do this "automatic thinlto"?

Rustc by default does thin local lto when optimizations are enabled. This is basically thin lto except that only the codegen units within a single crate participate. This is done to allow parallelizing optimizations within a single crate without the runtime performance hit of using multiple codegen units. You can disable it using -Clto=off or -Ccodegen-units=1. The former comes at a runtime perf hit, while the latter comes at a build perf hit as optimizations are no longer done in parallel.

zmodem commented 4 months ago

Does rustc need to do this "automatic thinlto"? (In Chromium's case this was unexpected, maybe we could turn it off).

Aha, that seems to get decided here:

https://github.com/rust-lang/rust/blob/bc1b9e0e9a813d27a09708b293dc2d41c472f0d0/compiler/rustc_session/src/session.rs#L663

Sounds like we could pass -Clto=no if we want.

(Sorry @bjorn3, I hadn't seen your comment when I posted. Thanks for explaining.)

zmodem commented 4 months ago
  • Is the non-determinism, i.e. which CGU ends up with the symbol definition, on the rustc or LLVM side?

I think the non-determinism is on the rustc side. With lld's thinlto, it seems the definition of the weak variable always ends up in the first .o file:

$ echo 'int __attribute__((weak)) x = 42;' > a.c
$ echo 'int __attribute__((weak)) x = 42;' > b.c
$ echo 'extern int x; int main() { return x; }' > main.c

$ rm -f /tmp/*.bc && clang main.c a.c b.c -fuse-ld=lld -flto=thin -Wl,-save-temps

$ llvm-dis -o - /tmp/a-*.o.1.promote.bc && llvm-dis -o - /tmp/b-*.o.1.promote.bc
; ModuleID = '/tmp/a-16bfe3.o.1.promote.bc'
source_filename = "a.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@x = weak dso_local global i32 42, align 4

...
; ModuleID = '/tmp/b-977289.o.1.promote.bc'
source_filename = "b.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@x = external global i32, align 4

...
  • Is this a problem with ThinLTO in general? Can we come up with an example where ThinLTO's weak symbol resolution breaks linking a C program?

I'm starting to think the answer is no, at least not a realistic one.

Here's an attempt:

$ echo 'extern int foo(); int main() { return foo(); }' > main1.c
$ echo 'int __attribute__((weak)) x = 42; int main() { return 42; }' > main2.c

$ echo 'int __attribute__((weak)) x = 42;' > a.c
$ echo 'int __attribute__((weak)) x = 42; int foo() { return x; }' > b.c

$ clang -c main1.c
$ clang -c main2.c && ar r main2.a main2.o
$ clang main1.o main2.a a.c b.c -fuse-ld=lld -flto=thin -Wl,-save-temps
(links successfully)

The definition of x will go in a.c's object file, and it will be undefined in b.c's object file:

$ nm /tmp/a.out.lto.*.o

a.out.lto.a-a27e14.o:
0000000000000000 V x

a.out.lto.b-faa6cd.o:
0000000000000000 T foo
                 U x

Since the object file with the definition comes first, main2.a does not get linked in.

The reproducer in the first post relies on the linker hitting the undefined symbol first. When lld does thinlto, that never happens since it puts the definition in the first object file. With rustc, however, the objects get put in a static library, which means the linker may not see the object with the symbol definition first.

Note that even if rustc always put the definition in the first object, that wouldn't help since the object file gets put in an rlib and static library linking semantics means that object file is not necessarily linked first. It would only help if we also used --whole-archive, but that doesn't seem practical.