Closed aclysma closed 10 months ago
I think this is actually expected behavior, at least with the current compilation model. Almost all functions in nphysics are generic, so the compilation to the actual machine code happens not in nphysics, but in the crate that fully specifies generic parameters (instantiates templates, in C++ parlance).
Because both the main crate and the shim crate fully specify nphysics types, you get two copies. Because optimization flags are different, linker doesn't eliminate one.
cc @ehuss for profile overrides. It might be a good idea to mention that profile overrides could inflate the binary size due to un-duplicated monomorphisations. I haven't considered this side-effect of overrides before.
Ah, I had been trying to think of how this might happen, that sounds like a plausible explanation! I've been having a similar problem trying to override the profile for std
, but I am getting linker errors due to symbol name munging. I wonder if it is the same issue!
I just learned about the -Zshare-generics=yes
flag (#48779), which will export/link the monomorphization from the optimized crate instead of instantiating it in the local one. It still acts a little wonky (it isn't as fast as being fully optimized), but it is more consistent.
This has the counter-intuitive result that if you change the override from opt-level = 3
to opt-level = 1
, it's actually faster, presumably because it enables this mode.
Triage: Can someone help me move this issue to cargo please? I don't think there is anything actionable by T-compiler here? Bugs with -Zshare-generics=yes would have to be reported separately, I think.
@Enselic, I don't think there is anything to do here, due to the way generics are instantiated. Cargo certainly can't do anything about it. This is documented at https://doc.rust-lang.org/cargo/reference/profiles.html#overrides-and-generics, so I'm going to close since I don't think there is anything else to do here.
I opened this issue because I'm concerned there could be unsoundness if a function is generated differently and each generated impl references a structure where members may have been compiled out for one of them (say a string member is stripped in release only.) Maybe this is a non-issue because the structs would be seen as different types? But it would be quite subtle if you were serializing that struct with bincode or something like that.
(Oh I guess it was originally the surprise in performance differences, this was 4 years ago. However I do think whether this can lead to unsound behavior should be carefully considered.)
I'm trying to use the "profile-overrides" feature in cargo. The tracking issue for this feature is here: https://github.com/rust-lang/rust/issues/48683
The documentation for this is here: https://doc.rust-lang.org/cargo/reference/unstable.html#profile-overrides
I'm trying to build my crate without optimizations, and upstream crates with optimizations. (The application runs too slowly to properly test if the upstream crates are not optimized.)
My .toml looks like this:
I have a minimum reproducible example here: https://github.com/aclysma/mre-optimize-dependencies-only
The "slow" crate is nphysics. The minimum reproducible example contains a "main" root crate and a shim crate. Both have nearly the same code, but behave differently:
Expected Behavior: the linked binary would have a single implementation for any functions in nphysics, and all call sites would jump to that one address Observed Behavior: My linked executable appears to have both an unoptimized and optimized version of nphysics, and depending on if the caller is optimized or not, the jump goes to a different address.