rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.73k stars 12.5k forks source link

codegen-units + ThinLTO is not as good as codegen-units = 1 #47745

Open nagisa opened 6 years ago

nagisa commented 6 years ago

We recently had a fair amount of reports about code generation quality drop. One of the recent causes for the quality drop is the enablement of codegen-units and ThinLTO.

It seems that ThinLTO is not capable of producing results matching those obtained by compiling without codegen-units in the first place.

The list of known reports follows:

Improvements to ThinLTO quality are inbound with the soon-to-happen LLVM upgrade(s), however those do not help sufficiently, it would be nice to figure out why ThinLTO is not doing good enough job.

cc @alexcrichton @nikomatsakis

nikomatsakis commented 6 years ago

Thanks for filing this @nagisa =)

alexcrichton commented 6 years ago

Indeed thanks! I'll try to take a closer look at this when we've upgraded LLVM

matthiaskrgr commented 6 years ago

By the way there is a great talk about how thinlto is designed here: https://www.youtube.com/watch?v=p9nH2vZ2mNo in case people are curious. :)

robsmith11 commented 6 years ago

Matrix multiplication is slower with thinlto + multiple codege-units using https://github.com/bluss/matrixmultiply .

I can create a minimal example if needed.

johnthagen commented 5 years ago

I've always thought that there should be another Cargo profile, something like:

# The publish profile, used for `cargo build --publish`.
[profile.publish]
# (...) everything else the same as profile.release except:
lto = true        # Enable full link-time optimization.
codegen-units = 1 # Use only 1 codegen-unit to enable full optimizations.

Because I feel like there should be a distinction between release builds the developer compiles on their local machine during development (not debug builds, but "fast" release builds) and truly publishable builds (like, for example the version of Firefox that is released for public consumption) in which case sacrificing build time once is more acceptable.

I realize the status-quo for C/C++ is to also not enable LTO by default, but it just seems strange to me to have to opt into these kinds of performance enhancements when the cost (for published binaries) is a one-time compile time cost.

HadrienG2 commented 5 years ago

I think "publish" is uncomfortably close to "release". But I could get behind a "debug/optimize/release" terminology proposal.

forrestthewoods commented 5 years ago

Historically I've used debug, internal, release, retail.

Plus a few variations with "add-ons" such as "Retail-Logging" or "Retail-Instrumented".

For Rust instead of 'Retail' I'd propose MaxSpeed. Whatever it's called, a profile with lto=true and codegen-units=1 is definitely a good idea!

brson commented 5 years ago

@johnthagen I agree that today's 'release' profile seems to have two use cases that want different configurations. Is it possible to create custom cargo profiles? Is there an upstream cargo issue for this?

johnthagen commented 5 years ago

@brson It looks like it's not yet implemented, but it is has been discussed for several years.

Perhaps @matklad has some more up-to-date information on this?

matklad commented 5 years ago

My understanding is that "custom profiles" are pretty far-away at this moment (we need to do profile overrides first), however we do have config profiles nightly features, which allows overriding profile via .cargo/config. This might be used, for example, to specify codegen-units=1 on the build-server which produces release artifacts.

brson commented 5 years ago

Thanks @johnthagen @matklad for the leads!

pnkfelix commented 2 years ago

Visiting for T-compiler backlog bonanza, since it was tagged as C-tracking-issue (perhaps erroneously)

pnkfelix commented 2 years ago

@rustbot label: -C-tracking-issue

pnkfelix commented 2 years ago

Also, given that the original point of the issue was to determine why ThinLTO didn't seem to do a good enough job, that seems like a question that is well-suited for wg-llvm.

@rustbot label: A-llvm