rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.9k stars 12.67k forks source link

Tracking Issue for `optimize_for_size` standard library feature #125612

Open Kobzol opened 4 months ago

Kobzol commented 4 months ago

https://github.com/rust-lang/rust/pull/125011 has added a optimize_for_size feature to the standard library, which is designed for special casing some algorithms in core/alloc/std to provide an implementation that tries to minimize its effect on binary size.

To use this feature, the standard library has to be compiled using -Zbuild-std:

$ cargo +nightly build -Z build-std -Z build-std-features="optimize_for_size" ...

Optimizations that leverage this flag:

clarfonthey commented 4 months ago

Small aside, but do you think that code using optimize_for_size should be always optimized for size, or only in release mode? Since a lot of the submitted PRs use cfg!(feature = "optimize_for_size") to test, and I'm not sure that dead code removal will always work in these cases without optimizations enabled.

Obviously, to get the best benefits, you're going to want optimizations anyway, but I'm thinking that we probably do want to ensure that code actually isn't compiled in all cases using conditional compilation, rather than just conditions. This also might help building on memory-constrained environments since the code won't have to be kept in memory before it's removed.

I also think that longer-term, there should be an easier flag to enable this feature that also takes into account optimization options (making sure loops aren't unrolled, etc.) but at least for now, while it's just a Cargo feature, it makes sense to at least figure out what cases we should optimize for.

Kobzol commented 4 months ago

I would be extremely surprised if any cfg-ed out code survived even in debug mode without optimizations. Even something like this:

fn main() {
    #[cfg(feature = "foo")]
    println!("hellorust");
}

Doesn't contain the hellorust string in the final binary in a debug build when the feature is disabled.

folkertdev commented 4 months ago

in practice we've been using the cfg!(feature = "optimize_for_size") macro a bunch, which would rely on the optimizer evaluating and simplifying boolean conditions and then removing branches or blocks. I suspect in practice this is reliable enough. Certainly on small examples in godbolt this seems to work fine.

More generally, the point of this flag is really --release mode. The size of a debug binary is just not representative at all. Especially on embedded, --release is effectively standard: you don't want to wait for your debug binary to be pushed through a wire to your device. Because optimization passes in general don't really give guarantees I think it's fine that the optimizer may miss some opportunities in a debug build.

I say that in part because using proper conditional compilation would make the code harder to maintain, to the point where we may require completely separate functions for when the flag is enabled or not. We've been trying to prevent completely separate implementations from happening so far: that won't always work but when we can I think the result is better.

clarfonthey commented 4 months ago

I would be extremely surprised if any cfg-ed out code survived even in debug mode without optimizations. Even something like this:

fn main() {
    #[cfg(feature = "foo")]
    println!("hellorust");
}

Doesn't contain the hellorust string in the final binary in a debug build when the feature is disabled.

Right, note that this is different from:

fn main() {
    if cfg!(feature = "foo") {
        println!("hellorust");
    }
}

since #[cfg(...)] will prevent the code from even compiling, whereas this will compile the code but leave it inside a dead branch under if false.

I say that in part because using proper conditional compilation would make the code harder to maintain, to the point where we may require completely separate functions for when the flag is enabled or not. We've been trying to prevent completely separate implementations from happening so far: that won't always work but when we can I think the result is better.

I agree here, mostly just wanted to ask because options do exist (like the currently unstable cfg_match macro). And while you might not explicitly push debug code to a device, you might run tests on it off the device in debug mode, and it's important to ensure that these tests are accurate. I can imagine also wanting to compare generated assembly between debug and release mode to see what optimisations are done, without having the non-size-optimised code getting in the way.

Basically agree with the current implementation, but think it's important to properly clarify so folks are aware of the best way to implement things.

Kobzol commented 4 months ago

Ah, sorry, I didn't realize that with cfg! it is different. Well, I would be also extremely surprised if if false survived even in debug mode :laughing: But happy to see a counterexample.

dragonnn commented 4 months ago

Hi! I tested this option on my embedded project using embassy on a STM32F401 and my code size did go up from 335kB to 360kB. I can do some debugging/investigation why it happens if giving some instructions what to test for. Thanks!

Kobzol commented 4 months ago

Did it go up vs just using -Zbuild-std without the optimize_for_size feature, or versus a normal cargo build with a pre-built libstd?

dragonnn commented 4 months ago

Ah, I think I found the issue, adding -Z build-std-features="std/optimize_for_size" to cargo build overrides build-std-features from cargo config.toml and that disabled panic_immediate_abort when building my project and bumped the code size. When I add -Zbuild-std-features="std/optimize_for_size panic_immediate_abort" my code size goes down: with "std/optimize_for_size"

   text    data     bss     dec     hex filename
 339432    2800   35452  377684   5c354 init-stm32

without:

   text    data     bss     dec     hex filename
 340176    2800   35452  378428   5c63c init-stm32

Not a lot but it is something. Thanks! Btw. small nitpick, why the reason for the std/ prefix? Since options like panic_immediate_abort don't have it, sounds kind redundant when it is already in build-std-features

Kobzol commented 4 months ago

Hmm, that's weird, on CI we enable it with just optimize_for_size, and in Cargo.toml of stdlib it seems to be defined in the same way as panic_immediate_abort (https://github.com/rust-lang/rust/pull/125011/files).

I think that optimize_for_size is enough. This works for me:

$ cargo +nightly build -Zbuild-std -Zbuild-std-features="optimize_for_size,panic_immediate_abort" --target x86_64-unknown-linux-gnu
diondokter commented 4 months ago

Not a lot but it is something. Thanks!

With what's currently merged, the max you can get is ~1.5kb savings on opt-level z. There's certainly more to be done for sure :)

dragonnn commented 4 months ago

Hmm, that's weird, on CI we enable it with just optimize_for_size, and in Cargo.toml of stdlib it seems to be defined in the same way as panic_immediate_abort (https://github.com/rust-lang/rust/pull/125011/files).

I think that optimize_for_size is enough. This works for me:

$ cargo +nightly build -Zbuild-std -Zbuild-std-features="optimize_for_size,panic_immediate_abort" --target x86_64-unknown-linux-gnu

Oh, right it works both way, doesn't metter if it starts with std/ or not. I assumed from the first posts it needs to start with std/. Thanks!

Kobzol commented 4 months ago

Sorry, that was an error on my part, fixed the code example. Thanks!

clarfonthey commented 4 months ago

This definitely feels like something that could use a proper guide mentioning these sorts of caveats-- the unstable book would probably be a good place to put it for now. That way, we could update it as more cases like these pop up, and it would also help with the bit I mentioned about being able to have a general flag for "use size-optimised libstd" that makes sure the correct optimisation settings, etc. are applied to make it work. (Having a full list of the things people have to do to make it work, means we can take that into account when making a flag that "just works" for this case.)

Perhaps in the distant future, we could even see official builds of "size-optimised" libstd for certain architectures on rustup!

And we can also link that in the original comment so that folks can see the up-to-date version.

johnthagen commented 4 months ago

This has been added to min-sized-rust to increase visibility. Thanks @Kobzol