Unsupported flags for mold during lto

zhongyi51 commented 10 months ago

Greetings. According to this doc, I tried to perform cross-language lto across C/C++ and Rust with mold as the linker. However, I found that existing rustc only supports lld.

The command I ran is:

RUSTFLAGS="-Clinker-plugin-lto -Clinker=clang -Clink-arg=-fuse-ld=mold" cargo build --release

However, an error was thrown:

mold: unknown command line option: -plugin-opt=O0

After some investigation, I found this problem is caused by the code at here :

        self.linker_args(&[
            &format!("-plugin-opt={opt_level}"),
            &format!("-plugin-opt=mcpu={}", self.target_cpu),
        ]);

These flags assume users only use lld as the linker. However, mold does not support these flags, see here.

Also, this problem happened on Macos, see here.

Could we support more linkers during the linker-plugin-lto? Or we can make users override these flags?

### Tasks

polarathene commented 8 months ago

I have been able to build without that error using the rust alpine Docker image and extra packages to configure clang + mold.

I used rust 1.75, clang 17, mold 2.4. The failure occurs with the RUSTFLAGS ENV (even with --target which was a fix for -C target-feature=+crt-static), instead I had to use .cargo/config.toml with my target.

However doing so seems to duplicate the LTO work, most notable with Cargo.toml using lto="fat", which increased final build step to 3 mins (vs 50 sec) and 4 mins extra for mold process via linker-plugin-lto.

If I build without the .cargo/config.toml and use the RUSTFLAGS env it will build roughly the same time with lld, without the env either (nor --target) it builds in 4.5 minutes.

For lto="thin" (default) in cargo release profile for that same project is 50s when repeating the build command, while another 30s was added when using the RUSTFLAGS, however the lld process was active for approx 50s, and the rest prior to rustc process.

You can use incremental = true in Cargo.toml to cache the thinlto build it seems, and the clang lld linker has a similar setting you can use when linker-plugin-lto is enabled, reduce the time of each process to approx 6s each, so 12s each time I run that build command.

incremental = true however has no effect when lto = "fat".. While the clang lld linker equivalent still seems to create and leverage its thinlto cache.

polarathene commented 8 months ago

You need to include -C link-arg=-flto in your RUSTFLAGS env, it avoids the failure you experienced with mold.

I have noticed that -C link-arg=-flto=full for equivalent of fat LTO doesn't seem to do anything.

I'm not sure what the relevance of that is, but this extra arg will let you use mold, and I have noticed it doesn't seem to affect timing any differently vs with/without it and lld.

The mold process invoked still uses what cargo was configured for, so thin LTO still uses multiple threads.

Additionally when delegating to clang, the parallelism is reduced to default of 1 thread per physical core, you'd need to set the jobs to all to use a thread per logical core to match full cpu usage. Although it doesn't improve the time that much, so this still ends up slower due to whatever extra overhead is involved.

zhongyi51 commented 8 months ago

@polarathene Thanks for your reply! My mold version is mold 1.0.2, with system macOS 12.5. I tried to add your param -C link-arg=-flto to RUSTFLAGS, but this problem still exists. According to the error message, I still think this issue is caused by the pre-set arguments in rustc compiler...

polarathene commented 8 months ago

My mold version is mold 1.0.2

That does not have LTO support. LTO with Mold came with the 1.1 release.

You can download Mold from github and build a newer version locally, then provide an explicit path to the built binary like this:

-C link-arg=-fuse-ld=/path/to/mold

That should work, but you'll also need to ensure that with the Clang linker is a compatible version for the Rust toolchain you use, otherwise that'll be your next failure encountered.

I've got a relevant comment for configuring this via .config/config.toml (or equivalent in RUSTFLAGS env) with some extra insights documented here: https://github.com/Swatinem/rust-cache/issues/43#issuecomment-1972164225

You can try this with the official Rust Docker image. rust:alpine will let you get clang and mold setup easily with apk add clang mold, but build times will be slower than the Debian image rust:latest due to the system memory allocator. You can use mimalloc package with LD_PRELOAD ENV to workaround that, but it's a tad awkward 😅

The Debian image won't work out of the box. While I think the Mold package shipped is 1.x with LTO support, the Clang and toolchain are mismatched requiring extra work for compatibility there AFAIK.

rust-lang / rust

Unsupported flags for mold during lto #119332