rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
95k stars 12.25k forks source link

[nll] why is inflate so dang slow? #51377

Closed nikomatsakis closed 6 years ago

nikomatsakis commented 6 years ago

Looking at http://perf.rust-lang.org/nll-dashboard.html, the inflate test is way out of line with the others — what is making it so slow? We need to do some profiles.

I wrote up a guide to profiling that talks about how I do profiling.

Mark-Simulacrum commented 6 years ago

Note that when doing profiling you should be using the in-tree inflate, not the crate off of crates.io, as their code is quite different.

spastorino commented 6 years ago
[santiago@archlinux inflate (master)]$ perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 3
Matcher    : {do_mir_borrowck}
Matches    : 2290
Not Matches: 331
Percentage : 87%

Tree
| matched `{do_mir_borrowck}` (87% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (72% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check_internal (64% total, 0% self)
: : : | rustc_mir::borrow_check::nll::type_check::type_check::_$u7b$$u7b$closure$u7d$$u7d$::h5e644cf9693979bb (63% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (63% total, 59% self)
: : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve (6% total, 0% self)
: : : | rustc::util::common::time (6% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve_inner (6% total, 0% self)
: : : : : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::compute_region_values (6% total, 0% self)
: : : : : : | <rustc_data_structures::bitvec::SparseBitMatrix<R, C>>::merge (6% total, 0% self)
: : : : : : : | <alloc::btree::map::BTreeMap<K, V>>::entry (5% total, 1% self)
: : : : : : : : | alloc::btree::search::search_tree (3% total, 0% self)
: : : : : : : : : | alloc::btree::search::search_tree (3% total, 3% self)
: | rustc::mir::visit::Visitor::visit_mir (5% total, 3% self)
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'gcx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (4% total, 0% self)
: | rustc_mir::dataflow::do_dataflow (3% total, 0% self)
nnethercote commented 6 years ago

Speaking about the non-NLL case: inflate is really weird. I don't know why, but the profiles for it look entirely different to those for any of the other rustc-perf benchmarks. It spends an unusual amount of time in obligation_forest code. In particular, it makes a bazillion calls to shallow_resolve, which in turn makes a bazillion calls to get_root_key within ena. From Cachegrind, here is a key hotspot:

          .      fn process_obligation(&mut self,
          .                            pending_obligation: &mut Self::Obligation)
          .                            -> ProcessResult<Self::Obligation, Self::Error>
          .      {
          .          // if we were stalled on some unresolved variables, first check
          .          // whether any of them have been resolved; if not, don't bother
          .          // doing more work yet
 55,898,106          if !pending_obligation.stalled_on.is_empty() {
 59,971,481              if pending_obligation.stalled_on.iter().all(|&ty| {
179,914,443                  let resolved_ty = self.selcx.infcx().shallow_resolve(&ty);
          .                  resolved_ty == ty // nothing changed here
          .              }) {
          .                  debug!("process_predicate: pending obligation {:?} still stalled on {:?}",
          .                         self.selcx.infcx()
          .                             .resolve_type_vars_if_possible(&pending_obligation.obligation),
          .                         pending_obligation.stalled_on);
167,524,737                  return ProcessResult::Unchanged;
          .              }
     94,744              pending_obligation.stalled_on = vec![];
          .          }

51411 improves some of that.

clap-rs is the benchmark most similar to inflate, but it still spends much less time in those parts of the code.

I don't know if those functions are used with NLL, but hopefully that's useful info.

nikomatsakis commented 6 years ago

@nnethercote pretty interesting. As of now NLL time is 300% of normal time, which is still high though not as high as it was to start. I guess it is worth profiling this separately, particularly once #51411 lands...it seems like https://github.com/rust-lang/rust/pull/51538 does not particularly help.

eddyb commented 6 years ago

As usual, you really need to keep in mind that the inflate version that's in rustc-perf is ridiculously bloated, as it predates https://github.com/PistonDevelopers/inflate/pull/35 (the PR looks small but try expanding the macros).

Could we please rename it to inflate-old-and-bloated (or a similar shorthand)? Ideally we could apply such a rename retroactively to past data as well.

Also, I'd rather tell people to update their inflate, than block NLL. And we should ask @bvssvni to release a new 0.x.y version for every 0.x of inflate, with the de-bloating applied.

clap-rs is the benchmark most similar to inflate

Funny you should mention that (it's also a pre-debloat) https://github.com/rust-lang-nursery/rustc-perf/issues/183#issuecomment-367232425

As an aside, this sort of thing (one function with a lot of code in it) is why I started hacking on rustc many years ago. I was auto-generating a huge machine code decoder function and rustc was OOM-ing on it. IIRC, one of the culprits was borrowck, failing to allocate roughly 700TB for one Vec.

bvssvni commented 6 years ago

@eddyb can you find somebody more familiar with inflate to make these releases?

eddyb commented 6 years ago

@bvssvni Do you mean making the git commits, or also publishing? I don't know anyone other than you who can publish piston crates (I don't think I can, anyway, but I'm not sure how to check). cc @RazrFalcon for the former, but I might do it myself if I find time for it.

bvssvni commented 6 years ago

@eddyb all PistonCollaborators can publish. You're one of them.

eddyb commented 6 years ago

@bvssvni Makes sense, is that automatic via GitHub? I'm not sure what the process is, and I can't find you on IRC. In the meanwhile, I realized that all past version bumps don't appear to be necessary and left this comment with a suggestion of just publishing the latest 0.4 patch version under all previous minor versions: https://github.com/PistonDevelopers/inflate/pull/37#issuecomment-401116551.

nikomatsakis commented 6 years ago

I'm going to close this issue, as inflate is now 278.32%. Slow but it's a tiny edge case test anyway and it's not 10x anymore.