rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.81k stars 12.77k forks source link

compile-time: massive regression during 2015 #25069

Closed nwin closed 8 years ago

nwin commented 9 years ago

I noticed earlier that the compile time of image increased massively during the beginning of 2015. Unfortunately it was hard to quantify this regression since we also changed a lot during this time.

Fortunately I just found a good example that shows a 20-fold increase in compile time. It is a single file with no other dependencies (if somebody wants to try: the original version is even much older). The two revisions are exactly the same code, the newer just has been updated to compile on a recent rustc. Note that this is rustc only, no time is spend in llvm.

$ sudo rustc --version
rustc 0.13.0-nightly (7608dbad6 2014-12-31 10:06:21 -0800)
$ time sudo rustc --crate-type=lib -Z no-trans inflate.rs

real    0m1.176s
user    0m1.059s
sys 0m0.113s

(note that something is wrong with the old .pkg-file I used to install rustc, I need to use sudo to run it)

$ rustc --version
rustc 1.1.0-nightly (c4b23aec4 2015-04-29) (built 2015-04-28)
$ time rustc --crate-type=lib -Z no-trans inflate.rs

real    0m25.208s
user    0m23.919s
sys 0m1.213s

The time profiles show that driver::phase_1_parse_input and driver::phase_2_configure_and_expand almost stayed constant while the time spent in driver::phase_3_run_analysis_passes exploded.

bluss commented 9 years ago

If you can, bench after the merge of aecf3d8 / PR #24965 as well. Merged Fri May 1 08:33:08 2015 +0000 so it should be in the nightlies. Oh and share output of -Z time-passes too!

nwin commented 9 years ago

that improved marginally (edit: I checked that is after the merge)

$ rustc --version
rustc 1.1.0-nightly (c42c1e7a6 2015-05-02) (built 2015-05-01)
$ time rustc --crate-type=lib -Z no-trans inflate.rs

real    0m19.984s
user    0m18.948s
sys 0m1.033s
$ rustc --crate-type=lib -Z no-trans -Z time-passes inflate.rs
time: 0.007     parsing
time: 0.000     recursion limit
time: 0.001     configuration 1
time: 0.000     gated macro checking
time: 0.000     crate injection
time: 0.009     macro loading
time: 0.000     plugin loading
time: 0.000     plugin registration
time: 0.169     expansion
time: 0.001     complete gated feature checking 1
time: 0.018     configuration 2
time: 0.009     maybe building test harness
time: 0.008     prelude injection
time: 0.001     checking that all macro invocations are gone
time: 0.001     complete gated feature checking 2
time: 0.010     assigning node ids and indexing ast
time: 0.001     external crate/lib resolution
time: 0.002     language item collection
time: 0.037     resolution
time: 0.002     lifetime resolution
time: 0.000     looking for entry point
time: 0.000     looking for plugin registrar
time: 0.012     region resolution
time: 0.001     loop checking
time: 0.001     static item recursion checking
time: 0.002     type collecting
time: 0.001     variance inference
time: 0.129     coherence checking
time: 18.775    type checking
time: 0.048     const checking
time: 0.009     privacy checking
time: 0.000     stability index
time: 0.003     intrinsic checking
time: 0.003     effect checking
time: 0.006     match checking
time: 0.194     liveness checking
time: 0.438     borrow checking
time: 0.035     rvalue checking
time: 0.000     reachability checking
time: 0.007     death checking
time: 0.079     stability checking
time: 0.000     unused lib feature checking
time: 0.116     lint checking
arielb1 commented 9 years ago

Your code has a >10kLOC method. Couldn't you split it?

arielb1 commented 9 years ago

Performance does seem O(n^2) in the number of len! calls.

nwin commented 9 years ago

Maybe it is a duplicate of #23977?

bluss commented 9 years ago

Tested current nightly after the PR to inline hashing better. Only reduces run time of the type checking pass by 6%.

pnkfelix commented 9 years ago

Did binary search over the nightlies (thanks to multirust); problem was injected between nightly-2015-03-30 and nightly-2015-03-31

In other words, this version is fast:

rustc 1.0.0-nightly (d8be84eb4 2015-03-29) (built 2015-03-29)

and this version is slow:

rustc 1.0.0-nightly (6cf3b0b74 2015-03-30) (built 2015-03-30)

Update: scanning over git log d8be84eb4..6cf3b0b74, I wonder whether #23673 is a likely culprit.

bluss commented 9 years ago

@pnkfelix That's what niko is indicating with the other issue too https://github.com/rust-lang/rust/issues/23977#issuecomment-89035822

Edit: Awesome job bisecting though! That's a great trick to use.

Manishearth commented 9 years ago

I have noticed a similar effect in Servo across a particular Rust upgrade (forgot which).

Can we give this priority now that 1.0 is out?

arielb1 commented 9 years ago

@Manishearth

Are you sure that this is the issue?

Manishearth commented 9 years ago

No, I'm not, but regardless, we should be fixing this with priority.

arielb1 commented 9 years ago

The problem is that literal inference creates tons of literal variables (IntVid/FloatVid) and obligations around them, and we uselessly try to resolve these obligations.

I think of 3 ways to solve this 1) Make the operator impls built-in, essentially impl<intvar T> Add<T> for T. This would avoid creating bogus obligations. 2) dually, make check_binop not try to select_obligations_where_possible when it ends up with a float/int var. This would prevent the O(n^2)-ness, but if there are other places that select_obligations_where_possible we would have O(n m) 3) Quickly ignore obligations that didn't make progress during selection. I think this would improve selection performance, but this requires more complexity. cc @nikomatsakis

pnkfelix commented 8 years ago

Nominating for prioritization. I suspect at this point its merely P-medium, but best to discuss. (Plus for all I know its actually been resolved, since #25916 was resolved by PR #31680, but then again that same PR notes that the quadratic behavior is still present; it just has less impact...)

nikomatsakis commented 8 years ago

In compiler team meeting: we believe this to have been fixed. Would be good to verify whether the perf gap has been corrected in the meantime!

nikomatsakis commented 8 years ago

triage: P-medium

DemiMarie commented 8 years ago

Has anyone tried benchmarking the compiler?

Mark-Simulacrum commented 8 years ago

I believe @nnethercote has been working on profiling and benchmarking the compiler, and there is also http://perf.rust-lang.org/ which attempts to keeps track of current compile times.

Ignore the 18000% regression on perf.rlo right now, we recently changed what we were benchmarking for syntex which caused that jump.

nnethercote commented 8 years ago

Yes, I have been doing some benchmarking and profiling. So far I've landed #36734 and #36592, and I just opened #36917. I think there is still some low- to middle-hanging fruit to be picked.

I have also done some work on https://github.com/rust-lang-nursery/rustc-benchmarks. I added a script that lets you compare the speed of two different compilers. I also fixed it so that the syntex benchmark now measures the right thing, which explains the huge regression that @Mark-Simulacrum mentioned.

pnkfelix commented 8 years ago

I just took the gist linked earlier and re-ran it on today's compiler.

(I had to add some extra #![feature] and #![allow] attributes to get it to compile without error, but that process is pretty mechanical and uninteresting.)

The performance of the compiler has not returned to the point where it was at circa March 29th. In particular:

Nightly Commit Time
nightly-2015-03-30 d8be84eb4 real 0m6.208s
nightly-2015-03-31 6cf3b0b74 real 0m18.546s
nightly-2015-07-30 4d52d7c85 real 0m13.056s
nightly-2015-08-30 f3f23bf9c real 0m12.320s
nightly-2015-09-30 65d5c0833 real 0m12.639s
nightly-2016-01-30 303892ee1 real 0m11.610s
nightly-2016-02-14 fae516277 real 0m12.203s
nightly-2016-02-17 57c357d89 real 0m2.163s
nightly-2016-02-28 54fdae3d6 real 0m2.181s
nightly-2016-09-30 289f3a4ca real 0m2.273s

so we've managed to recover from part of the regression, but not all of it.


Update: hold on, I just realized that I was looking at September of 2015 in those runs. Let me do a few more.

Update 2: Okay, now that I actually looked at runs from 2016, it looks like we have completely recovered all of the lost time, at least on the isolated benchmark in the gist.

pnkfelix commented 8 years ago

I'm going to close this ticket since I believe we have addressed the specific compile-time regression described here.

nnethercote commented 8 years ago

I just took the gist linked earlier and re-ran it on today's compiler.

FWIW: that gist contains inflate.rs. A slightly different version of that file is in rustc-benchmarks, which means it's likely to get some attention. In fact, I've already done some profiling for it and found that it is totally dominated by operations involving ObligationForest. (It's the only benchmark I've seen for which this is true.) I will try to get around to looking at that one soon, though I don't know anything about ObligationForest so if someone more knowledgeable wants to look in the meantime, that would be fine with me :)

nikomatsakis commented 8 years ago

@nnethercote feel free to ping me sometime if you want to look at it a bit together

arielb1 commented 8 years ago

@nnethercote

That would be expected. The test-case triggers an O(n^2) case in ObligationForest. We have mitigated that case to take a reasonable amount of time on reasonable examples.

It is possible that this could be made O(n), but the ways seem to have bad overheads in the common case.