rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.54k stars 12.74k forks source link

non-pub function no longer compiled in debug profile, causing link errors on thumbv7em-none-eabihf with defmt #131164

Closed jonathanpallant closed 4 weeks ago

jonathanpallant commented 1 month ago

Code

I tried this code:

$ git clone https://github.com/ferrous-systems/rust-exercises.git
$ cd rust-exercises
$ git checkout v1.17.0
$ cd nrf52_code/radio_app
$ rustup target add thumbv7em-none-eabihf --toolchain=nightly
$ cargo install flip-link (or change `.cargo/config.toml` to not use flip-link - either is fine)
$ cargo +nightly build

I expected to see this happen:

The binaries build like, they do on stable.

Instead, this happened:

  = note: rust-lld: error: /Users/jonathan/Documents/clients/training/open-rust-embedded-2024-10-02/rust-exercises-v1.17.0/nrf52-code/radio-app/target/thumbv7em-none-eabihf/debug/build/defmt-36a2ab5d209daca3/out/defmt.x:7: symbol not found: __defmt_default_panic
          rust-lld: error: /Users/jonathan/Documents/clients/training/open-rust-embedded-2024-10-02/rust-exercises-v1.17.0/nrf52-code/radio-app/target/thumbv7em-none-eabihf/debug/build/defmt-36a2ab5d209daca3/out/defmt.x:7: symbol not found: __defmt_default_panic
          rust-lld: error: /Users/jonathan/Documents/clients/training/open-rust-embedded-2024-10-02/rust-exercises-v1.17.0/nrf52-code/radio-app/target/thumbv7em-none-eabihf/debug/build/defmt-36a2ab5d209daca3/out/defmt.x:7: symbol not found: __defmt_default_panic
          rust-lld: error: /Users/jonathan/Documents/clients/training/open-rust-embedded-2024-10-02/rust-exercises-v1.17.0/nrf52-code/radio-app/target/thumbv7em-none-eabihf/debug/build/defmt-36a2ab5d209daca3/out/defmt.x:7: symbol not found: __defmt_default_panic

That symbol is defined here: https://github.com/knurling-rs/defmt/blob/b7a89f56059fc66ac3c7cc3445d092829599e699/defmt/src/lib.rs#L365

Bisect

@Dirbaio said:

searched nightlies: from nightly-2024-07-30 to nightly-2024-10-02 regressed nightly: nightly-2024-08-01 searched commit range: https://github.com/rust-lang/rust/compare/f8060d282d42770fadd73905e3eefb85660d3278...28a58f2fa7f0c46b8fab8237c02471a915924fe5 regressed commit: https://github.com/rust-lang/rust/commit/0b5eb7ba7bd796fb39c8bb6acd9ef6c140f28b65

That's the LLVM 19 upgrade.

Dirbaio commented 1 month ago

it regressed both with and without --release.

jonathanpallant commented 1 month ago

I also ran a bisect:

searched nightlies: from nightly-2024-07-01 to nightly-2024-10-02 regressed nightly: nightly-2024-08-01 searched commit range: https://github.com/rust-lang/rust/compare/f8060d282d42770fadd73905e3eefb85660d3278...28a58f2fa7f0c46b8fab8237c02471a915924fe5 regressed commit: https://github.com/rust-lang/rust/commit/0b5eb7ba7bd796fb39c8bb6acd9ef6c140f28b65

apiraino commented 1 month ago

WG-prioritization assigning priority (Zulip discussion).

I can also reproduce on current beta so tagging accordingly

@rustbot label -I-prioritize +P-medium +regression-from-stable-to-beta

Dirbaio commented 1 month ago

minimized, zero deps -> https://github.com/Dirbaio/defmt-repro

[dirbaio@mars defmt-repro]$ rustc --version
rustc 1.83.0-nightly (06bb8364a 2024-10-01)
[dirbaio@mars defmt-repro]$ cargo build
   Compiling radio_app v0.0.0 (/home/dirbaio/defmt-repro)
error: linking with `rust-lld` failed: exit status: 1
  |
  = note: LC_ALL="C" PATH="/home/dirb(snip)47a" "--gc-sections" "-Tlol.x"
  = note: rust-lld: error: lol.x:1: symbol not found: __defmt_default_panic
          rust-lld: error: lol.x:1: symbol not found: __defmt_default_panic
          rust-lld: error: lol.x:1: symbol not found: __defmt_default_panic

error: could not compile `radio_app` (bin "radio_app") due to 1 previous error
[dirbaio@mars defmt-repro]$ rustc +stable --version
rustc 1.81.0 (eeb90cda1 2024-09-04)
[dirbaio@mars defmt-repro]$ cargo +stable build
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.06s
Dirbaio commented 1 month ago

maybe related: https://github.com/llvm/llvm-project/commit/ebb326a51fec37b5a47e5702e8ea157cd4f835cd

jannic commented 1 month ago

It seems to all depend on the linker version, not the compiler itself. Using ldd from beta/nightly with a stable compiler triggers this issue. Using ldd from stable on a beta/nightly compiler doesn't.

So I compiled lld from llvm myself, and can confirm: With lld built from commit ebb326a51fec37b5a47e5702e8ea157cd4f835cd the build of Dirbaio's minimized example fails, whereas with b6dfaf4c291ee186481f6c1dcab03874d931c307 it succeeds.

Specifically, it seems to be this change: https://github.com/llvm/llvm-project/commit/ebb326a51fec37b5a47e5702e8ea157cd4f835cd#diff-efb0f6c06c654ed1044098de2bd75495d2fecd1730054dd26d41f0c9c93996e8R1582

If I revert that change (by simply changeing the if condition to if (false && activeProvideSym)) the example also succeeds to build again.

jannic commented 1 month ago

This seems to be enough to "fix" this issue:

--- a/lld/ELF/ScriptParser.cpp
+++ b/lld/ELF/ScriptParser.cpp
@@ -1662,7 +1662,7 @@ Expr ScriptParser::readPrimary() {
     tok = unquote(tok);
   else if (!isValidSymbolName(tok))
     setError("malformed number: " + tok);
-  if (activeProvideSym)
+  if (false && activeProvideSym)
     ctx.script->provideMap[*activeProvideSym].push_back(tok);
   else
     ctx.script->referencedSymbols.push_back(tok);

I put "fix" in quotes because is not meant as a solution: I didn't even try to understand what the if/else branches are meant to do. So it's quite likely that I broke something else with this change.

(Verified on current LLVM main branch.)

jonathanpallant commented 1 month ago

the next stable release is in 9 days, at which point I believe the Ferrous Systems Embedded Rust exercises will stop compiling.

DianQK commented 1 month ago

Upstream issue: https://github.com/llvm/llvm-project/issues/111478.

Urhengulas commented 1 month ago

Is anyone aware of a workaround we could apply for defmt? We would like to avoid defmt being broken on stable with the Rust release next week.

I tried marking the defmt::default_panic function as pub, but this does not make a difference.

Is there something we can add to the linker script? Something along the lines of PLEASE_DONT_GARBAGE_COLLECT(__defmt_default_panic).

Or can we pretend that this function is in fact used?

Any pointer is much appreciated.

skade commented 1 month ago

the next stable release is in 9 days, at which point I believe the Ferrous Systems Embedded Rust exercises will stop compiling.

Note that while the exercises are relatively small impact (as they are under our control), defmt is a widely popular solution in the embedded space.

DianQK commented 1 month ago

I have read the related code, maybe we need to add this symbol to the symbol table. Or change the code like this:

diff --git a/lld/ELF/ScriptParser.cpp b/lld/ELF/ScriptParser.cpp
index 3febcfb87da4..44185c282ad7 100644
--- a/lld/ELF/ScriptParser.cpp
+++ b/lld/ELF/ScriptParser.cpp
@@ -1660,10 +1660,11 @@ Expr ScriptParser::readPrimary() {
     tok = unquote(tok);
   else if (!isValidSymbolName(tok))
     setError("malformed number: " + tok);
-  if (activeProvideSym)
+  if (activeProvideSym) {
     ctx.script->provideMap[*activeProvideSym].push_back(tok);
-  else
-    ctx.script->referencedSymbols.push_back(tok);
+    return [] { return 0; };
+  }
+  ctx.script->referencedSymbols.push_back(tok);
   return [=, s = ctx.script] { return s->getSymbolValue(tok, location); };
 }

However, I'm not familiar with this part of the code.

the next stable release is in 9 days, at which point I believe the Ferrous Systems Embedded Rust exercises will stop compiling.

Note that while the exercises are relatively small impact (as they are under our control), defmt is a widely popular solution in the embedded space.

In any case, I believe the upstream backport is likely to miss our backport process. I'm reverting this commit to our fork LLVM.

Urhengulas commented 1 month ago

@DianQK said:

In any case, I believe the upstream backport is likely to miss our backport process. I'm reverting this commit to our fork LLVM.

Would this mean this issue will not land on stable? That would be great, then we do not need to find a workaround for defmt.

apiraino commented 1 month ago

Note that while the exercises are relatively small impact (as they are under our control), defmt is a widely popular solution in the embedded space.

@rustbot label -P-medium +P-high

DianQK commented 1 month ago

@DianQK said:

In any case, I believe the upstream backport is likely to miss our backport process. I'm reverting this commit to our fork LLVM.

Would this mean this issue will not land on stable? That would be great, then we do not need to find a workaround for defmt.

Yes! I think so.

Dirbaio commented 1 month ago

Impact of the regression is this breaks all projects that

A possible workaround defmt could do is adding EXTERN(_defmt_panic) to the linker script. The side effect is it'd prevent the panic handler from being optimized out when it's truly unused, I think.

Urhengulas commented 1 month ago

A possible workaround defmt could do is adding EXTERN(_defmt_panic) to the linker script. The side effect is it'd prevent the panic handler from being optimized out when it's truly unused, I think.

In case the fix (https://github.com/rust-lang/llvm-project/pull/178) does not make it into the next stable I'd think this is an okay compromise to make until it does get fixed. But let's hope it won't be necessary.

DianQK commented 1 month ago

@DianQK said:

In any case, I believe the upstream backport is likely to miss our backport process. I'm reverting this commit to our fork LLVM.

Would this mean this issue will not land on stable? That would be great, then we do not need to find a workaround for defmt.

I have submitted PR #131448. I apologize that I may have misspoken. I can't guarantee that this issue won't land on stable, and even reverting this change carries some risk. :)

DianQK commented 1 month ago

Impact of the regression is this breaks all projects that

* Use defmt

* and do define a `#[defmt::panic_handler]`

* and don't ever use `defmt::panic!()` or the other macros that panic (assert, todo, unreachable, etc.)

A possible workaround defmt could do is adding EXTERN(_defmt_panic) to the linker script. The side effect is it'd prevent the panic handler from being optimized out when it's truly unused, I think.

Would it be possible to provide any actual project? :3 I think this will be helpful in determining whether to backport.

jannic commented 1 month ago

Would it be possible to provide any actual project? :3 I think this will be helpful in determining whether to backport.

One example is mentioned in the initial report, https://github.com/ferrous-systems/rust-exercises.git

If you think that doesn't count, as it's training material and may not be representative for real-world code, another example would be https://github.com/team-arrow-racing/steering-wheel-controller (I'm not affiliated with that project in any way. Found it with a code search on github.)

$ cargo +beta build
[...]
  = note: rust-lld: error: /tmp/steering-wheel-controller/target/thumbv7em-none-eabihf/debug/build/defmt-76d6e60563c62b4a/out/defmt.x:7: symbol not found: __defmt_default_panic
          rust-lld: error: /tmp/steering-wheel-controller/target/thumbv7em-none-eabihf/debug/build/defmt-76d6e60563c62b4a/out/defmt.x:7: symbol not found: __defmt_default_panic
          rust-lld: error: /tmp/steering-wheel-controller/target/thumbv7em-none-eabihf/debug/build/defmt-76d6e60563c62b4a/out/defmt.x:7: symbol not found: __defmt_default_panic
          rust-lld: error: /tmp/steering-wheel-controller/target/thumbv7em-none-eabihf/debug/build/defmt-76d6e60563c62b4a/out/defmt.x:7: symbol not found: __defmt_default_panic

warning: `steering-wheel-controller` (bin "steering-wheel-controller") generated 7 warnings
error: could not compile `steering-wheel-controller` (bin "steering-wheel-controller") due to 1 previous error; 7 warnings emitted
skade commented 1 month ago

Most concrete binary projects of defmt are not public, but it has 415 reverse dependencies. https://crates.io/crates/defmt/reverse_dependencies. Multiple of those have about 1mio downloads. It has consistently over 10k daily downloads, showing active usage e.g. in CI/CD pipelines.

It is very widely used in embedded software.

DianQK commented 1 month ago

The minimal reproduction is as follows:

main.rs:

#![no_main]
#![no_std]

#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
    loop {}
}

#[no_mangle]
fn foo() {}

#[no_mangle]
fn bar() {}

script.t:

PROVIDE(foo = bar);
$ rustc +beta main.rs --target thumbv7em-none-eabihf -Clink-arg=-Tscript.t
error: linking with `rust-lld` failed: exit status: 1
  ...
  = note: rust-lld: error: script.t:1: symbol not found: bar
          rust-lld: error: script.t:1: symbol not found: bar
          rust-lld: error: script.t:1: symbol not found: bar
error: aborting due to 1 previous error

With small modifications, it can also be reproduced on x86.

@rustbot label +S-has-mcve +A-linkage

apiraino commented 4 weeks ago

I think this can be now closed, since #131448 has been backported and should have made it to the stable 1.82 release of tomorrrow.

Am I right @jonathanpallant ? Can you check if we're all good here? thanks :)

Dirbaio commented 4 weeks ago

I can confirm nightly-2024-10-13 fixes it. nightly-2024-10-12 is the last one failing.

Beta works as well.

DianQK commented 4 weeks ago

Upstream LLVM will not backport the relevant fixes, so I will add a test case.

Urhengulas commented 4 weeks ago

Upstream LLVM will not backport the relevant fixes, so I will add a test case.

Do you think there is a chance that this "bug" will come back to us? We are currently evaluating if we should apply a fix/safeguard in defmt.

DianQK commented 4 weeks ago

Upstream LLVM will not backport the relevant fixes, so I will add a test case.

Do you think there is a chance that this "bug" will come back to us? We are currently evaluating if we should apply a fix/safeguard in defmt.

No. The main branch of LLVM has fixed. The next major update will include it.

Urhengulas commented 4 weeks ago

Upstream LLVM will not backport the relevant fixes, so I will add a test case.

Do you think there is a chance that this "bug" will come back to us? We are currently evaluating if we should apply a fix/safeguard in defmt.

No. The main branch of LLVM has fixed. The next major update will include it.

Perfect!

thejpster commented 4 weeks ago

Lgtm

jannic commented 4 weeks ago

Also tried with the stable prerelease from https://internals.rust-lang.org/t/rust-1-82-0-pre-release-testing/21705. Problem is solved.