Closed nwin closed 8 years ago
If you can, bench after the merge of aecf3d8 / PR #24965 as well. Merged Fri May 1 08:33:08 2015 +0000 so it should be in the nightlies. Oh and share output of -Z time-passes too!
that improved marginally (edit: I checked that is after the merge)
$ rustc --version
rustc 1.1.0-nightly (c42c1e7a6 2015-05-02) (built 2015-05-01)
$ time rustc --crate-type=lib -Z no-trans inflate.rs
real 0m19.984s
user 0m18.948s
sys 0m1.033s
$ rustc --crate-type=lib -Z no-trans -Z time-passes inflate.rs
time: 0.007 parsing
time: 0.000 recursion limit
time: 0.001 configuration 1
time: 0.000 gated macro checking
time: 0.000 crate injection
time: 0.009 macro loading
time: 0.000 plugin loading
time: 0.000 plugin registration
time: 0.169 expansion
time: 0.001 complete gated feature checking 1
time: 0.018 configuration 2
time: 0.009 maybe building test harness
time: 0.008 prelude injection
time: 0.001 checking that all macro invocations are gone
time: 0.001 complete gated feature checking 2
time: 0.010 assigning node ids and indexing ast
time: 0.001 external crate/lib resolution
time: 0.002 language item collection
time: 0.037 resolution
time: 0.002 lifetime resolution
time: 0.000 looking for entry point
time: 0.000 looking for plugin registrar
time: 0.012 region resolution
time: 0.001 loop checking
time: 0.001 static item recursion checking
time: 0.002 type collecting
time: 0.001 variance inference
time: 0.129 coherence checking
time: 18.775 type checking
time: 0.048 const checking
time: 0.009 privacy checking
time: 0.000 stability index
time: 0.003 intrinsic checking
time: 0.003 effect checking
time: 0.006 match checking
time: 0.194 liveness checking
time: 0.438 borrow checking
time: 0.035 rvalue checking
time: 0.000 reachability checking
time: 0.007 death checking
time: 0.079 stability checking
time: 0.000 unused lib feature checking
time: 0.116 lint checking
Your code has a >10kLOC method. Couldn't you split it?
Performance does seem O(n^2) in the number of len!
calls.
Maybe it is a duplicate of #23977?
Tested current nightly after the PR to inline hashing better. Only reduces run time of the type checking pass by 6%.
Did binary search over the nightlies (thanks to multirust); problem was injected between nightly-2015-03-30 and nightly-2015-03-31
In other words, this version is fast:
rustc 1.0.0-nightly (d8be84eb4 2015-03-29) (built 2015-03-29)
and this version is slow:
rustc 1.0.0-nightly (6cf3b0b74 2015-03-30) (built 2015-03-30)
Update: scanning over git log d8be84eb4..6cf3b0b74
, I wonder whether #23673 is a likely culprit.
@pnkfelix That's what niko is indicating with the other issue too https://github.com/rust-lang/rust/issues/23977#issuecomment-89035822
Edit: Awesome job bisecting though! That's a great trick to use.
I have noticed a similar effect in Servo across a particular Rust upgrade (forgot which).
Can we give this priority now that 1.0 is out?
@Manishearth
Are you sure that this is the issue?
No, I'm not, but regardless, we should be fixing this with priority.
The problem is that literal inference creates tons of literal variables (IntVid
/FloatVid
) and obligations around them, and we uselessly try to resolve these obligations.
I think of 3 ways to solve this
1) Make the operator impls built-in, essentially impl<intvar T> Add<T> for T
. This would avoid creating bogus obligations.
2) dually, make check_binop not try to select_obligations_where_possible
when it ends up with a float/int var. This would prevent the O(n^2)
-ness, but if there are other places that select_obligations_where_possible
we would have O(n m)
3) Quickly ignore obligations that didn't make progress during selection. I think this would improve selection performance, but this requires more complexity.
cc @nikomatsakis
Nominating for prioritization. I suspect at this point its merely P-medium, but best to discuss. (Plus for all I know its actually been resolved, since #25916 was resolved by PR #31680, but then again that same PR notes that the quadratic behavior is still present; it just has less impact...)
In compiler team meeting: we believe this to have been fixed. Would be good to verify whether the perf gap has been corrected in the meantime!
triage: P-medium
Has anyone tried benchmarking the compiler?
I believe @nnethercote has been working on profiling and benchmarking the compiler, and there is also http://perf.rust-lang.org/ which attempts to keeps track of current compile times.
Ignore the 18000% regression on perf.rlo right now, we recently changed what we were benchmarking for syntex which caused that jump.
Yes, I have been doing some benchmarking and profiling. So far I've landed #36734 and #36592, and I just opened #36917. I think there is still some low- to middle-hanging fruit to be picked.
I have also done some work on https://github.com/rust-lang-nursery/rustc-benchmarks. I added a script that lets you compare the speed of two different compilers. I also fixed it so that the syntex
benchmark now measures the right thing, which explains the huge regression that @Mark-Simulacrum mentioned.
I just took the gist linked earlier and re-ran it on today's compiler.
(I had to add some extra #![feature]
and #![allow]
attributes to get it to compile without error, but that process is pretty mechanical and uninteresting.)
The performance of the compiler has not returned to the point where it was at circa March 29th. In particular:
Nightly | Commit | Time |
---|---|---|
nightly-2015-03-30 | d8be84eb4 | real 0m6.208s |
nightly-2015-03-31 | 6cf3b0b74 | real 0m18.546s |
nightly-2015-07-30 | 4d52d7c85 | real 0m13.056s |
nightly-2015-08-30 | f3f23bf9c | real 0m12.320s |
nightly-2015-09-30 | 65d5c0833 | real 0m12.639s |
nightly-2016-01-30 | 303892ee1 | real 0m11.610s |
nightly-2016-02-14 | fae516277 | real 0m12.203s |
nightly-2016-02-17 | 57c357d89 | real 0m2.163s |
nightly-2016-02-28 | 54fdae3d6 | real 0m2.181s |
nightly-2016-09-30 | 289f3a4ca | real 0m2.273s |
so we've managed to recover from part of the regression, but not all of it.
Update: hold on, I just realized that I was looking at September of 2015 in those runs. Let me do a few more.
Update 2: Okay, now that I actually looked at runs from 2016, it looks like we have completely recovered all of the lost time, at least on the isolated benchmark in the gist.
I'm going to close this ticket since I believe we have addressed the specific compile-time regression described here.
I just took the gist linked earlier and re-ran it on today's compiler.
FWIW: that gist contains inflate.rs. A slightly different version of that file is in rustc-benchmarks, which means it's likely to get some attention. In fact, I've already done some profiling for it and found that it is totally dominated by operations involving ObligationForest
. (It's the only benchmark I've seen for which this is true.) I will try to get around to looking at that one soon, though I don't know anything about ObligationForest
so if someone more knowledgeable wants to look in the meantime, that would be fine with me :)
@nnethercote feel free to ping me sometime if you want to look at it a bit together
@nnethercote
That would be expected. The test-case triggers an O(n^2) case in ObligationForest
. We have mitigated that case to take a reasonable amount of time on reasonable examples.
It is possible that this could be made O(n), but the ways seem to have bad overheads in the common case.
I noticed earlier that the compile time of
image
increased massively during the beginning of 2015. Unfortunately it was hard to quantify this regression since we also changed a lot during this time.Fortunately I just found a good example that shows a 20-fold increase in compile time. It is a single file with no other dependencies (if somebody wants to try: the original version is even much older). The two revisions are exactly the same code, the newer just has been updated to compile on a recent rustc. Note that this is
rustc
only, no time is spend in llvm.(note that something is wrong with the old .pkg-file I used to install rustc, I need to use sudo to run it)
The time profiles show that
driver::phase_1_parse_input
anddriver::phase_2_configure_and_expand
almost stayed constant while the time spent indriver::phase_3_run_analysis_passes
exploded.