rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.98k stars 12.68k forks source link

Using profile-overrides results in both optimized and unoptimized versions of the same crate in linked executable #63484

Closed aclysma closed 10 months ago

aclysma commented 5 years ago

I'm trying to use the "profile-overrides" feature in cargo. The tracking issue for this feature is here: https://github.com/rust-lang/rust/issues/48683

The documentation for this is here: https://doc.rust-lang.org/cargo/reference/unstable.html#profile-overrides

I'm trying to build my crate without optimizations, and upstream crates with optimizations. (The application runs too slowly to properly test if the upstream crates are not optimized.)

My .toml looks like this:

[profile.dev]
opt-level = 0

[profile.dev.overrides."*"]
opt-level = 3

I have a minimum reproducible example here: https://github.com/aclysma/mre-optimize-dependencies-only

The "slow" crate is nphysics. The minimum reproducible example contains a "main" root crate and a shim crate. Both have nearly the same code, but behave differently:

Expected Behavior: the linked binary would have a single implementation for any functions in nphysics, and all call sites would jump to that one address Observed Behavior: My linked executable appears to have both an unoptimized and optimized version of nphysics, and depending on if the caller is optimized or not, the jump goes to a different address.

matklad commented 5 years ago

I think this is actually expected behavior, at least with the current compilation model. Almost all functions in nphysics are generic, so the compilation to the actual machine code happens not in nphysics, but in the crate that fully specifies generic parameters (instantiates templates, in C++ parlance).

Because both the main crate and the shim crate fully specify nphysics types, you get two copies. Because optimization flags are different, linker doesn't eliminate one.

cc @ehuss for profile overrides. It might be a good idea to mention that profile overrides could inflate the binary size due to un-duplicated monomorphisations. I haven't considered this side-effect of overrides before.

ehuss commented 5 years ago

Ah, I had been trying to think of how this might happen, that sounds like a plausible explanation! I've been having a similar problem trying to override the profile for std, but I am getting linker errors due to symbol name munging. I wonder if it is the same issue!

ehuss commented 5 years ago

I just learned about the -Zshare-generics=yes flag (#48779), which will export/link the monomorphization from the optimized crate instead of instantiating it in the local one. It still acts a little wonky (it isn't as fast as being fully optimized), but it is more consistent.

This has the counter-intuitive result that if you change the override from opt-level = 3 to opt-level = 1, it's actually faster, presumably because it enables this mode.

Enselic commented 10 months ago

Triage: Can someone help me move this issue to cargo please? I don't think there is anything actionable by T-compiler here? Bugs with -Zshare-generics=yes would have to be reported separately, I think.

ehuss commented 10 months ago

@Enselic, I don't think there is anything to do here, due to the way generics are instantiated. Cargo certainly can't do anything about it. This is documented at https://doc.rust-lang.org/cargo/reference/profiles.html#overrides-and-generics, so I'm going to close since I don't think there is anything else to do here.

aclysma commented 10 months ago

I opened this issue because I'm concerned there could be unsoundness if a function is generated differently and each generated impl references a structure where members may have been compiled out for one of them (say a string member is stripped in release only.) Maybe this is a non-issue because the structs would be seen as different types? But it would be quite subtle if you were serializing that struct with bincode or something like that.

aclysma commented 10 months ago

(Oh I guess it was originally the surprise in performance differences, this was 4 years ago. However I do think whether this can lead to unsound behavior should be carefully considered.)