Closed nikomatsakis closed 6 years ago
Note that when doing profiling you should be using the in-tree inflate, not the crate off of crates.io, as their code is quite different.
[santiago@archlinux inflate (master)]$ perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 3
Matcher : {do_mir_borrowck}
Matches : 2290
Not Matches: 331
Percentage : 87%
Tree
| matched `{do_mir_borrowck}` (87% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (72% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check_internal (64% total, 0% self)
: : : | rustc_mir::borrow_check::nll::type_check::type_check::_$u7b$$u7b$closure$u7d$$u7d$::h5e644cf9693979bb (63% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (63% total, 59% self)
: : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve (6% total, 0% self)
: : : | rustc::util::common::time (6% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve_inner (6% total, 0% self)
: : : : : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::compute_region_values (6% total, 0% self)
: : : : : : | <rustc_data_structures::bitvec::SparseBitMatrix<R, C>>::merge (6% total, 0% self)
: : : : : : : | <alloc::btree::map::BTreeMap<K, V>>::entry (5% total, 1% self)
: : : : : : : : | alloc::btree::search::search_tree (3% total, 0% self)
: : : : : : : : : | alloc::btree::search::search_tree (3% total, 3% self)
: | rustc::mir::visit::Visitor::visit_mir (5% total, 3% self)
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'gcx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (4% total, 0% self)
: | rustc_mir::dataflow::do_dataflow (3% total, 0% self)
Speaking about the non-NLL case: inflate
is really weird. I don't know why, but the profiles for it look entirely different to those for any of the other rustc-perf benchmarks. It spends an unusual amount of time in obligation_forest
code. In particular, it makes a bazillion calls to shallow_resolve
, which in turn makes a bazillion calls to get_root_key
within ena
. From Cachegrind, here is a key hotspot:
. fn process_obligation(&mut self,
. pending_obligation: &mut Self::Obligation)
. -> ProcessResult<Self::Obligation, Self::Error>
. {
. // if we were stalled on some unresolved variables, first check
. // whether any of them have been resolved; if not, don't bother
. // doing more work yet
55,898,106 if !pending_obligation.stalled_on.is_empty() {
59,971,481 if pending_obligation.stalled_on.iter().all(|&ty| {
179,914,443 let resolved_ty = self.selcx.infcx().shallow_resolve(&ty);
. resolved_ty == ty // nothing changed here
. }) {
. debug!("process_predicate: pending obligation {:?} still stalled on {:?}",
. self.selcx.infcx()
. .resolve_type_vars_if_possible(&pending_obligation.obligation),
. pending_obligation.stalled_on);
167,524,737 return ProcessResult::Unchanged;
. }
94,744 pending_obligation.stalled_on = vec![];
. }
clap-rs
is the benchmark most similar to inflate
, but it still spends much less time in those parts of the code.
I don't know if those functions are used with NLL, but hopefully that's useful info.
@nnethercote pretty interesting. As of now NLL time is 300% of normal time, which is still high though not as high as it was to start. I guess it is worth profiling this separately, particularly once #51411 lands...it seems like https://github.com/rust-lang/rust/pull/51538 does not particularly help.
As usual, you really need to keep in mind that the inflate
version that's in rustc-perf is ridiculously bloated, as it predates https://github.com/PistonDevelopers/inflate/pull/35 (the PR looks small but try expanding the macros).
Could we please rename it to inflate-old-and-bloated
(or a similar shorthand)?
Ideally we could apply such a rename retroactively to past data as well.
Also, I'd rather tell people to update their inflate
, than block NLL. And we should ask @bvssvni to release a new 0.x.y
version for every 0.x
of inflate
, with the de-bloating applied.
clap-rs
is the benchmark most similar toinflate
Funny you should mention that (it's also a pre-debloat) https://github.com/rust-lang-nursery/rustc-perf/issues/183#issuecomment-367232425
As an aside, this sort of thing (one function with a lot of code in it) is why I started hacking on rustc
many years ago. I was auto-generating a huge machine code decoder function and rustc was OOM-ing on it. IIRC, one of the culprits was borrowck, failing to allocate roughly 700TB for one Vec
.
@eddyb can you find somebody more familiar with inflate
to make these releases?
@bvssvni Do you mean making the git commits, or also publishing? I don't know anyone other than you who can publish piston crates (I don't think I can, anyway, but I'm not sure how to check). cc @RazrFalcon for the former, but I might do it myself if I find time for it.
@eddyb all PistonCollaborators can publish. You're one of them.
@bvssvni Makes sense, is that automatic via GitHub? I'm not sure what the process is, and I can't find you on IRC. In the meanwhile, I realized that all past version bumps don't appear to be necessary and left this comment with a suggestion of just publishing the latest 0.4
patch version under all previous minor versions: https://github.com/PistonDevelopers/inflate/pull/37#issuecomment-401116551.
I'm going to close this issue, as inflate is now 278.32%. Slow but it's a tiny edge case test anyway and it's not 10x anymore.
Looking at http://perf.rust-lang.org/nll-dashboard.html, the inflate test is way out of line with the others — what is making it so slow? We need to do some profiles.
I wrote up a guide to profiling that talks about how I do profiling.