rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.87k stars 12.52k forks source link

Nonlinear compile time blow-up with deeply nested types #38528

Closed sfackler closed 6 years ago

sfackler commented 7 years ago

Compiling the postgres-tokio crate at https://github.com/sfackler/rust-postgres/commit/d27518ba76d76ccaa59b3ccd63e981bd8bd0ef33 goes from 5 seconds to 45 seconds on nightly if the two .boxed() calls in the middle of this call chain are removed: https://github.com/sfackler/rust-postgres/blob/d27518ba76d76ccaa59b3ccd63e981bd8bd0ef33/postgres-tokio/src/lib.rs#L342-L408.

Looks like 15 seconds is spent in translation item collection, and 39 seconds is spent in translation:

time: 0.004 parsing
time: 0.005 recursion limit
time: 0.000 crate injection
time: 0.001 plugin loading
time: 0.000 plugin registration
time: 0.238 expansion
time: 0.000 maybe building test harness
time: 0.000 maybe creating a macro crate
time: 0.000 checking for inline asm in case the target doesn't support it
time: 0.000 complete gated feature checking
time: 0.008 early lint checks
time: 0.000 AST validation
time: 0.005 name resolution
time: 0.004 lowering ast -> hir
time: 0.002 indexing hir
time: 0.000 attribute checking
time: 0.001 language item collection
time: 0.001 lifetime resolution
time: 0.000 looking for entry point
time: 0.000 looking for plugin registrar
time: 0.001 region resolution
time: 0.000 loop checking
time: 0.000 static item recursion checking
time: 0.005 compute_incremental_hashes_map
time: 0.000 load_dep_graph
time: 0.000 stability index
time: 0.001 stability checking
time: 0.011 type collecting
time: 0.000 variance inference
time: 0.000 impl wf inference
time: 0.021 coherence checking
time: 0.008 wf checking
time: 0.003 item-types checking
time: 0.500 item-bodies checking
time: 0.000 drop-impl checking
time: 0.003 const checking
time: 0.001 privacy checking
time: 0.000 intrinsic checking
time: 0.000 effect checking
time: 0.001 match checking
time: 0.000 liveness checking
time: 0.002 rvalue checking
time: 0.014 MIR dump
  time: 0.001   SimplifyCfg
  time: 0.002   QualifyAndPromoteConstants
  time: 0.002   TypeckMir
  time: 0.000   SimplifyBranches
  time: 0.001   SimplifyCfg
time: 0.006 MIR cleanup and validation
time: 0.006 borrow checking
time: 0.000 reachability checking
time: 0.000 death checking
time: 0.000 unused lib feature checking
time: 1.366 lint checking
time: 0.000 resolving dependency formats
  time: 0.000   NoLandingPads
  time: 0.001   SimplifyCfg
  time: 0.001   EraseRegions
  time: 0.000   AddCallGuards
  time: 0.008   ElaborateDrops
  time: 0.000   NoLandingPads
  time: 0.001   SimplifyCfg
  time: 0.001   InstCombine
  time: 0.000   Deaggregator
  time: 0.000   CopyPropagation
  time: 0.001   SimplifyLocals
  time: 0.000   AddCallGuards
  time: 0.000   PreTrans
time: 0.015 MIR optimisations
  time: 0.001   write metadata
  time: 15.086  translation item collection
  time: 0.020   codegen unit partitioning
  time: 0.022   internalize symbols
time: 39.376    translation
time: 0.000 assert dep graph
time: 0.000 serialize dep graph
  time: 0.089   llvm function passes [0]
  time: 0.037   llvm module passes [0]
  time: 2.148   codegen passes [0]
  time: 0.001   codegen passes [0]
time: 2.432 LLVM passes
time: 0.000 serialize work products
time: 0.084 linking

Things are significantly worse on 1.13 - 2 minutes in translation!

Some discussion in IRC: https://botbot.me/mozilla/rust-internals/2016-12-22/?msg=78294648&page=1

cc @aturon


UPDATE: #40280 was closed as a duplicate of this. It had the following sample code:

let &(first, B, C, D, E, F, G, H) = self;
let iter = first.buffers_list();
let iter = iter.chain(B.buffers_list());
let iter = iter.chain(C.buffers_list());
let iter = iter.chain(D.buffers_list());
let iter = iter.chain(E.buffers_list());
let iter = iter.chain(F.buffers_list());
let iter = iter.chain(G.buffers_list());
let iter = iter.chain(H.buffers_list());
Box::new(iter)

--nmatsakis

nikomatsakis commented 7 years ago

Things are significantly worse on 1.13 - 2 minutes in translation!

This may be the effect of the projection cache.

aturon commented 7 years ago

cc @alexcrichton This is obviously a show-stopper for futures.

nikomatsakis commented 7 years ago

I can do some profiling. I've had some plans for improving collection/trans that I think may be related. One question to try and answer is what %age of this is just "we are making more code" vs "we are wasting time doing things in trait selection that could be cached". I have observed the latter from time to time and had some thoughts on how to fix it.

alexcrichton commented 7 years ago

I've also seen this before, with tokio-socks5 as well. Removing just a handful of the trait objects in that file makes the compile time of the crate shoot from 2.34s to 89.52s (!!)

dwrensha commented 7 years ago

Here's a simple example of something that takes a very long time to compile:

future::ok::<(),()>(()).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(()));
nikomatsakis commented 7 years ago

Closed #40280 as a duplicate, moved some example code into the issue header.

vorner commented 7 years ago

Just a stupid question ‒ the „translation item collection“, which is one of the two culprits here, is listed in the „MIR optimisations“. Does it really have to happen at all on non-optimised debug build?

bluss commented 7 years ago

It's natural to read it that way, but the headers are actual after their group, so it's part of translation

vorner commented 7 years ago

This seems to be hard enough to take some time fixing. So I thought of a workaround, if someone is also interested. It uses the trick with placing trait objects to split the chains of modifiers, but only on testing/debug builds where the compilation speed matters, while it keeps the complex but hopefully faster concrete types in release build:

This one is for streams (that's what I needed), but can obviously work for futures or other things as well:

#[cfg(debug_assertions)]
fn test_boxed<T, E, S>(s: S) -> Box<Stream<Item = T, Error = E>>
    where S: Stream<Item = T, Error = E> + 'static
{
    Box::new(s)
}
#[cfg(not(debug_assertions))]
fn test_boxed<S>(s: S) -> S {
    s
}

(I didn't find any better config option than the debug_assertions, but I hope that one is good enough)

Mark-Simulacrum commented 7 years ago

Nominating in place of https://github.com/rust-lang/rust/issues/42941. cc @eddyb (nominator)

jonhoo commented 7 years ago

Repeating another bad case from #42941:

extern crate futures;
use futures::{future, IntoFuture, Future};
fn main() {
    let t: std::result::Result<(), ()> = Ok(());
    let f = t
            .into_future()
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()));
    f.wait();
}

The code above takes ~750s to compile on my laptop (you can make it shorter/longer by removing/adding .and_then calls):

...
  time: 0.000; rss: 108MB       write metadata
  time: 472.840; rss: 110MB     translation item collection
  time: 0.001; rss: 112MB       codegen unit partitioning
  time: 0.001; rss: 133MB       internalize symbols
time: 732.741; rss: 133MB       translation
...

The extra ~250s is spent between codegen unit partitioning and internalize symbols.

jonhoo commented 7 years ago

Another observation from #42941 about why this is becoming even more important is that the newly-released hyper 0.11 uses futures everywhere, which will likely cause many more programs to use chains of futures like this going forward.

nikomatsakis commented 7 years ago

triage: P-high

We are raising to "high priority" to at least do some investigation and try to determine whether revised trait solving strategies will be of use here.

arielb1 commented 7 years ago

status: waiting for niko

ishitatsuyuki commented 7 years ago

Bump. This really hurts impl Trait (boxing hides this issue).

hemangandhi commented 6 years ago

I'm not sure if this is fixed... I'm on nightly, version 1.23.0-nightly on Arch Linux...

I get the following in -Ztime-passes:

  time: 0.001; rss: 57MB    parsing
  time: 0.000; rss: 57MB    recursion limit
  time: 0.000; rss: 57MB    crate injection
  time: 0.000; rss: 57MB    plugin loading
  time: 0.000; rss: 57MB    plugin registration
  time: 0.054; rss: 122MB   expansion
  time: 0.000; rss: 122MB   maybe building test harness
  time: 0.000; rss: 122MB   maybe creating a macro crate
  time: 0.000; rss: 122MB   creating allocators
  time: 0.000; rss: 122MB   checking for inline asm in case the target doesn't support it
  time: 0.000; rss: 122MB   AST validation
  time: 0.006; rss: 128MB   name resolution
  time: 0.000; rss: 128MB   complete gated feature checking
  time: 0.000; rss: 128MB   lowering ast -> hir
  time: 0.000; rss: 128MB   early lint checks
  time: 0.001; rss: 128MB   indexing hir
  time: 0.000; rss: 128MB   attribute checking
  time: 0.000; rss: 128MB   lifetime resolution
  time: 0.000; rss: 128MB   looking for entry point
  time: 0.000; rss: 128MB   looking for plugin registrar
  time: 0.000; rss: 128MB   loop checking
  time: 0.000; rss: 128MB   static item recursion checking
  time: 0.000; rss: 132MB   stability checking
  time: 0.001; rss: 135MB   type collecting
  time: 0.000; rss: 135MB   outlives testing
  time: 0.000; rss: 135MB   impl wf inference
  time: 0.000; rss: 135MB   coherence checking
  time: 0.000; rss: 135MB   variance testing
  time: 0.001; rss: 139MB   wf checking
  time: 0.000; rss: 139MB   item-types checking
  time: 0.080; rss: 173MB   item-bodies checking
  time: 0.009; rss: 177MB   const checking
  time: 0.001; rss: 177MB   privacy checking
  time: 0.000; rss: 177MB   intrinsic checking
  time: 0.000; rss: 177MB   match checking
  time: 0.000; rss: 177MB   liveness checking
  time: 0.679; rss: 177MB   borrow checking
  time: 0.000; rss: 177MB   MIR borrow checking
  time: 0.000; rss: 177MB   MIR effect checking
  time: 0.000; rss: 177MB   death checking
  time: 0.000; rss: 177MB   unused lib feature checking
  time: 0.009; rss: 183MB   lint checking
  time: 0.000; rss: 183MB   resolving dependency formats
    time: 0.000; rss: 183MB write metadata

so this looks very similar. In fact, I too have code with a bunch of:

      future_maker()
           .and_then( futurey_stuff())
           .and_then( futurey_stuff())
           .and_then( futurey_stuff())
           .and_then( futurey_stuff())
           .and_then( futurey_stuff())
           .and_then( futurey_stuff())

Sorry if it's a bother... just wasn't sure where to ask and this seems to be the most recent reference.

Does it matter that some of the and_thens have move closures instead?

ishitatsuyuki commented 6 years ago

Of course not. Particularly for the futures case generators will help, but @nikomatsakis doesn't seem to be working on this.

hemangandhi commented 6 years ago

Just to double-check, that'd be std::ops::Generator?

ishitatsuyuki commented 6 years ago

The doc is here and the futures helper is here.

Ralith commented 6 years ago

Why would generators help? It's a trait, it doesn't do away with the deeply nested types.

rpjohnst commented 6 years ago

Generators help because they are an alternative to combinators. When you write a generator, you get a single (anonymous) type for the entire state machine; when you use combinators, you build the state machine out of nested types.

nikomatsakis commented 6 years ago

triage: P-medium

I'm going to lower this from P-high to reflect reality. I'm still eager to investigate but haven't had time, and making it P-high is not helping =)

nikomatsakis commented 6 years ago

So I finally did some work here. In particular, I extracted out a self-contained example that seems to reproduce the problem, which let me do a bit of experimentation. The gist contains two variants of the example; one that models Future quite closely (though without IntoFuture), and the other which removes the Output type. The timing results are quite interesting. See if you can spot the outlier. ;)

             size
variant      4      8      12      16
full         .19s   .34s   2.5s    36.9s
no-output    .19s   .26s    .9s     9.8s

I haven't had time to dig deeper than that yet.

hcpl commented 6 years ago

These can probably be considered even more self-contained than what @nikomatsakis have made in this post, and IMO they show the issue more consistently (reproducible exponential blowups by a factor of approximately 2 for each line of .chain(empty()) in every compilation step I specifically highlighted below) so I'll post them:

Generic version

Code ```rust pub fn foo() -> Box> { use std::iter::empty; Box::new(empty() .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) // 10th .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) // 17th .chain(empty()) ) } ```
$ rustc +nightly --crate-type=lib foo_generic.rs -Z time-passes
...
  time: 34.801; rss: 92MB   item-bodies checking
...

All other timings are <0.3s.

Full output ``` time: 0.001; rss: 43MB parsing time: 0.000; rss: 43MB recursion limit time: 0.000; rss: 43MB crate injection time: 0.000; rss: 43MB plugin loading time: 0.000; rss: 43MB plugin registration time: 0.068; rss: 68MB expansion time: 0.000; rss: 68MB maybe building test harness time: 0.000; rss: 68MB maybe creating a macro crate time: 0.000; rss: 68MB creating allocators time: 0.000; rss: 68MB checking for inline asm in case the target doesn't support it time: 0.000; rss: 68MB AST validation time: 0.006; rss: 70MB name resolution time: 0.000; rss: 70MB complete gated feature checking time: 0.000; rss: 71MB lowering ast -> hir time: 0.000; rss: 71MB early lint checks time: 0.000; rss: 71MB indexing hir time: 0.000; rss: 71MB attribute checking time: 0.000; rss: 71MB load query result cache time: 0.000; rss: 71MB lifetime resolution time: 0.000; rss: 71MB looking for entry point time: 0.000; rss: 71MB looking for plugin registrar time: 0.000; rss: 71MB loop checking time: 0.000; rss: 71MB static item recursion checking time: 0.000; rss: 71MB stability checking time: 0.001; rss: 75MB type collecting time: 0.000; rss: 75MB outlives testing time: 0.000; rss: 75MB impl wf inference time: 0.000; rss: 75MB coherence checking time: 0.000; rss: 75MB variance testing time: 0.005; rss: 80MB wf checking time: 0.000; rss: 80MB item-types checking time: 34.801; rss: 92MB item-bodies checking time: 0.012; rss: 95MB const checking time: 0.001; rss: 95MB privacy checking time: 0.000; rss: 95MB intrinsic checking time: 0.000; rss: 95MB match checking time: 0.000; rss: 95MB liveness checking time: 0.003; rss: 95MB borrow checking time: 0.000; rss: 95MB MIR borrow checking time: 0.000; rss: 95MB MIR effect checking time: 0.000; rss: 95MB death checking time: 0.000; rss: 95MB unused lib feature checking time: 0.018; rss: 100MB lint checking time: 0.000; rss: 100MB resolving dependency formats time: 0.002; rss: 100MB write metadata time: 0.000; rss: 100MB translation item collection time: 0.000; rss: 100MB codegen unit partitioning time: 0.000; rss: 114MB translate to LLVM IR time: 0.000; rss: 114MB assert dep graph time: 0.000; rss: 114MB serialize dep graph time: 0.267; rss: 114MB translation time: 0.000; rss: 114MB llvm function passes [lib0] time: 0.000; rss: 114MB llvm module passes [lib0] time: 0.001; rss: 115MB codegen passes [lib0] time: 0.005; rss: 115MB LLVM passes time: 0.000; rss: 113MB serialize work products time: 0.001; rss: 113MB linking ```

Specialized version

Code ```rust pub fn foo() -> Box> { use std::iter::empty; Box::new(empty() .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) // 10th .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) .chain(empty()) // 17th .chain(empty()) ) } ```
$ rustc +nightly --crate-type=lib foo_specialized.rs -Z time-passes
...
  time: 43.160; rss: 89MB   item-bodies checking
...
    time: 34.286; rss: 99MB translation item collection
...
    time: 129.115; rss: 122MB   translate to LLVM IR
...
  time: 163.680; rss: 122MB translation
...
  time: 28.048; rss: 122MB  LLVM passes
...

All other timings are <0.2s.

Full output ``` time: 0.001; rss: 41MB parsing time: 0.000; rss: 41MB recursion limit time: 0.000; rss: 41MB crate injection time: 0.000; rss: 41MB plugin loading time: 0.000; rss: 41MB plugin registration time: 0.057; rss: 66MB expansion time: 0.000; rss: 66MB maybe building test harness time: 0.000; rss: 66MB maybe creating a macro crate time: 0.000; rss: 66MB creating allocators time: 0.000; rss: 66MB checking for inline asm in case the target doesn't support it time: 0.000; rss: 66MB AST validation time: 0.006; rss: 68MB name resolution time: 0.000; rss: 68MB complete gated feature checking time: 0.000; rss: 69MB lowering ast -> hir time: 0.000; rss: 69MB early lint checks time: 0.000; rss: 69MB indexing hir time: 0.000; rss: 69MB attribute checking time: 0.000; rss: 69MB load query result cache time: 0.000; rss: 69MB lifetime resolution time: 0.000; rss: 69MB looking for entry point time: 0.000; rss: 69MB looking for plugin registrar time: 0.000; rss: 69MB loop checking time: 0.000; rss: 69MB static item recursion checking time: 0.000; rss: 69MB stability checking time: 0.001; rss: 73MB type collecting time: 0.000; rss: 73MB outlives testing time: 0.000; rss: 73MB impl wf inference time: 0.000; rss: 73MB coherence checking time: 0.000; rss: 73MB variance testing time: 0.002; rss: 75MB wf checking time: 0.000; rss: 75MB item-types checking time: 43.160; rss: 89MB item-bodies checking time: 0.014; rss: 91MB const checking time: 0.001; rss: 91MB privacy checking time: 0.000; rss: 91MB intrinsic checking time: 0.000; rss: 91MB match checking time: 0.000; rss: 91MB liveness checking time: 0.003; rss: 94MB borrow checking time: 0.000; rss: 94MB MIR borrow checking time: 0.000; rss: 94MB MIR effect checking time: 0.000; rss: 94MB death checking time: 0.000; rss: 94MB unused lib feature checking time: 0.018; rss: 96MB lint checking time: 0.000; rss: 96MB resolving dependency formats time: 0.001; rss: 96MB write metadata time: 34.286; rss: 99MB translation item collection time: 0.001; rss: 99MB codegen unit partitioning time: 0.000; rss: 114MB llvm function passes [lib0] time: 0.000; rss: 114MB llvm module passes [lib0] time: 0.006; rss: 114MB llvm function passes [lib4] time: 0.000; rss: 115MB llvm function passes [lib2] time: 0.000; rss: 115MB llvm module passes [lib2] time: 0.002; rss: 115MB llvm module passes [lib4] time: 0.001; rss: 117MB llvm function passes [lib5] time: 0.000; rss: 118MB llvm module passes [lib5] time: 0.000; rss: 117MB llvm function passes [lib3] time: 0.000; rss: 118MB llvm module passes [lib3] time: 0.011; rss: 118MB codegen passes [lib2] time: 0.011; rss: 117MB codegen passes [lib0] time: 0.000; rss: 120MB llvm function passes [lib8] time: 0.000; rss: 121MB llvm module passes [lib8] time: 0.012; rss: 122MB codegen passes [lib5] time: 0.004; rss: 120MB codegen passes [lib3] time: 0.008; rss: 122MB codegen passes [lib8] time: 0.134; rss: 122MB codegen passes [lib4] time: 0.001; rss: 122MB llvm function passes [lib1] time: 0.000; rss: 122MB llvm function passes [lib6] time: 0.001; rss: 122MB llvm module passes [lib1] time: 0.000; rss: 122MB llvm module passes [lib6] time: 0.000; rss: 122MB llvm function passes [lib7] time: 0.000; rss: 122MB llvm module passes [lib7] time: 129.115; rss: 122MB translate to LLVM IR time: 0.000; rss: 122MB assert dep graph time: 0.000; rss: 122MB serialize dep graph time: 163.680; rss: 122MB translation time: 0.000; rss: 121MB llvm function passes [lib9] time: 0.000; rss: 121MB llvm module passes [lib9] time: 0.003; rss: 121MB codegen passes [lib9] time: 0.008; rss: 121MB codegen passes [lib7] time: 0.009; rss: 122MB codegen passes [lib1] time: 0.011; rss: 122MB codegen passes [lib6] time: 28.048; rss: 122MB LLVM passes time: 0.000; rss: 122MB serialize work products time: 0.003; rss: 122MB linking ```

Rustc version:

$ rustc +nightly --version
rustc 1.23.0-nightly (827cb0d61 2017-11-26)
ishitatsuyuki commented 6 years ago

I'm digging on this. Some easy profiling:

ishitatsuyuki commented 6 years ago

After some digging it seems that https://github.com/rust-lang/rust/blob/ebddfcb0b19471d2277e05cfbc2ad8caa81637b7/src/librustc/ty/subst.rs#L324 may be a cause.

I only have little knowledge regarding the compiler, but if I'm correct this relates to @nikomatsakis experiment on the number of output types. This looks like a place we can do a exponential blowup, by the factor of parameters.

tanriol commented 6 years ago

I've tried profiling rustc and getting a backtrace for the generic version by @hcpl above at some random moment of time.

Likely relevant lines from the backtrace ``` #16 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #17 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013 #18 rustc::infer::InferCtxt::probe (self=, f=...) at librustc/infer/mod.rs:905 #19 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=, obligation_trait_ref=, candidate_set=0x7fffeca9f3d0) at librustc/traits/project.rs:1012 #20 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:817 -- #49 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #50 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013 #51 rustc::infer::InferCtxt::probe (self=, f=...) at librustc/infer/mod.rs:905 #52 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=, obligation_trait_ref=, candidate_set=0x7fffecaa1550) at librustc/traits/project.rs:1012 #53 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:817 -- #82 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #83 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecaa3650, obligation_trait_ref=) at librustc/traits/project.rs:1174 #84 rustc::traits::project::confirm_candidate (obligation=0x7fffecaa3650, obligation_trait_ref=, selcx=, candidate=...) at librustc/traits/project.rs:1161 #85 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:878 #86 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=) at librustc/traits/project.rs:541 -- #114 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #115 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013 #116 rustc::infer::InferCtxt::probe (self=, f=...) at librustc/infer/mod.rs:905 #117 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=, obligation_trait_ref=, candidate_set=0x7fffecaa55f0) at librustc/traits/project.rs:1012 #118 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:817 -- #147 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #148 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013 #149 rustc::infer::InferCtxt::probe (self=, f=...) at librustc/infer/mod.rs:905 #150 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=, obligation_trait_ref=, candidate_set=0x7fffecaa7770) at librustc/traits/project.rs:1012 #151 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:817 -- #180 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #181 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013 #182 rustc::infer::InferCtxt::probe (self=, f=...) at librustc/infer/mod.rs:905 #183 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=, obligation_trait_ref=, candidate_set=0x7fffecaa98f0) at librustc/traits/project.rs:1012 #184 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:817 -- #213 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #214 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecaab9f0, obligation_trait_ref=) at librustc/traits/project.rs:1174 #215 rustc::traits::project::confirm_candidate (obligation=0x7fffecaab9f0, obligation_trait_ref=, selcx=, candidate=...) at librustc/traits/project.rs:1161 #216 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:878 #217 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=) at librustc/traits/project.rs:541 -- #245 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #246 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecaad910, obligation_trait_ref=) at librustc/traits/project.rs:1174 #247 rustc::traits::project::confirm_candidate (obligation=0x7fffecaad910, obligation_trait_ref=, selcx=, candidate=...) at librustc/traits/project.rs:1161 #248 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:878 #249 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=) at librustc/traits/project.rs:541 -- #277 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #278 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecaaf830, obligation_trait_ref=) at librustc/traits/project.rs:1174 #279 rustc::traits::project::confirm_candidate (obligation=0x7fffecaaf830, obligation_trait_ref=, selcx=, candidate=...) at librustc/traits/project.rs:1161 #280 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:878 #281 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=) at librustc/traits/project.rs:541 -- #309 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #310 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013 #311 rustc::infer::InferCtxt::probe (self=, f=...) at librustc/infer/mod.rs:905 #312 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=, obligation_trait_ref=, candidate_set=0x7fffecab17d0) at librustc/traits/project.rs:1012 #313 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:817 -- #342 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #343 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecab38d0, obligation_trait_ref=) at librustc/traits/project.rs:1174 #344 rustc::traits::project::confirm_candidate (obligation=0x7fffecab38d0, obligation_trait_ref=, selcx=, candidate=...) at librustc/traits/project.rs:1161 #345 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:878 #346 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=) at librustc/traits/project.rs:541 -- #374 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #375 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecab57f0, obligation_trait_ref=) at librustc/traits/project.rs:1174 #376 rustc::traits::project::confirm_candidate (obligation=0x7fffecab57f0, obligation_trait_ref=, selcx=, candidate=...) at librustc/traits/project.rs:1161 #377 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:878 #378 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=) at librustc/traits/project.rs:541 -- #406 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #407 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecab7710, obligation_trait_ref=) at librustc/traits/project.rs:1174 #408 rustc::traits::project::confirm_candidate (obligation=0x7fffecab7710, obligation_trait_ref=, selcx=, candidate=...) at librustc/traits/project.rs:1161 #409 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:878 #410 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=) at librustc/traits/project.rs:541 -- #438 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #439 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecab9630, obligation_trait_ref=) at librustc/traits/project.rs:1174 #440 rustc::traits::project::confirm_candidate (obligation=0x7fffecab9630, obligation_trait_ref=, selcx=, candidate=...) at librustc/traits/project.rs:1161 #441 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:878 #442 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=) at librustc/traits/project.rs:541 -- #470 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #471 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013 #472 rustc::infer::InferCtxt::probe (self=, f=...) at librustc/infer/mod.rs:905 #473 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=, obligation_trait_ref=, candidate_set=0x7fffecabb5d0) at librustc/traits/project.rs:1012 #474 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:817 -- #503 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526 #504 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013 #505 rustc::infer::InferCtxt::probe (self=, f=...) at librustc/infer/mod.rs:905 #506 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=, obligation_trait_ref=, candidate_set=0x7fffecabd750) at librustc/traits/project.rs:1012 #507 rustc::traits::project::project_type (selcx=, obligation=) at librustc/traits/project.rs:817 ```

The cause in this case seems to be that at every Chain the rustc::traits::project::project_type function calls rustc::traits::select::SelectionContext::select twice: the first time from assemble_candidates_from_impls via InferCtxt::probe, the second time from rustc::traits::project::confirm_candidate. Both calls recurse on the next level, causing the exponential blowup.

nikomatsakis commented 6 years ago

Thanks for the profiles everyone, btw, I've been spending a bit more time investigating too but don't yet have any insights to share. =)

ishitatsuyuki commented 6 years ago

@nikomatsakis can you share what problem have you stumbled over? Thanks to @tanriol's profile I think the root cause is quite clear; we just need to cache the call to eliminate the blowup. What I'm worrying is that SelectionContext::select does have side effects, and as I don't have much knowledge on projection I don't know if the second (redundant) execution can be easily removed.

jonhoo commented 6 years ago

This came up again in https://www.reddit.com/r/rust/comments/7v4kfj/impl_trait_vs_boxtrait_or_how_i_learned_to_stop/

ishitatsuyuki commented 6 years ago

Yeah, so this is related to deeply nested types itself, and it doesn't matter where it appear. As argument, return type, or even just creating on stack triggers this issue. #39684 #43757 should be duplicates.

ishitatsuyuki commented 6 years ago

Finding as of today:

The blowup roots from confirm_impl_candidate in select module (not project). Confirming candidate is a non-cachable operation. It then attempts to normalize the type (which doesn't seem to cache too) inside vtable_impl and recurs to next level.

cc @tanriol @nikomatsakis Please check if this clears up some obstacles. Primarily I'm wondering why normalization is continuously performed (maybe due to snapshot?).

ishitatsuyuki commented 6 years ago

Okay, I think I've successfully understood the overall picture of this blowup.

First, it's true that we're calling SelectionContext::select twice per iteration. This is what we probably need to reduce to fix the issue.

Second, the blowup occurs in operations that are not currently cached. In confirm_impl_candidate, we create an snapshot and then calls vtable_impl, then impl_or_trait_obligations. It's the place we recur onto next level, and because we are in a snapshot, the unification results are completely discarded after the operation. This is the second cause making the operation non-cachable: unification relies on side effects so it doesn't execute twice, but we revert the changes so it's computing every time.

Adding to that, impl_or_trait_obligations tries to normalize the inner type (?) for each predicate, which sounds like a multiplicative behavior here. However, this is ran without clearing the snapshot, which may make the situation better (it still sounds like an issue to me, though). Also, the reproduction example only used Iterator obligation which is the only one, thus the multiplicative behavior here hasn't been tested yet.

To sum up, we didn't get any short circuits here due to the two factors preventing caching from happening. Next, I'm proposing some solutions below, focusing on fixing the "calling twice" problem:

  1. Try to preserve the snapshot for the impl candidate we resolve. We can reuse it later which eliminates the need of calling confirm_candidate the second time. However, as far I understood I'm not sure if snapshot is designed to be saved beyond scope (it always take a closure for pub APIs).

  2. Do not call confirm_candidates the first time, instead call it only when we're really confirming it. My concern is that this may breaks code that relies on side effects (unification) which I don't understand yet.

  3. This is rather focused on solving the normalization cache, and the idea is quite abstract. We can try to cache normalizations, but I have no idea what we should take as a key.

I need your expertise on this, so we can reach a solution that resolves the issue. Please give me your opinion!

ishitatsuyuki commented 6 years ago

Another update: the projection cache actually exists, but we're discarding them every time by wiping the InferCtxt (with snapshots). I have created a diagram to show how this recurs: Diagram

So basically we need to implement this FIXME: https://github.com/rust-lang/rust/blob/554fe71ac6172ce9c0baeabbfd692280dcc893fe/src/librustc/traits/project.rs#L1548-L1550

ishitatsuyuki commented 6 years ago

@arielb1 You seem to have created this FIXME. Can you give me some hints on implementing cross-infcx cache?

ishitatsuyuki commented 6 years ago

I have launched a PoC in #48296 and I confirm that the exponential part should be resolved. I still observe some degree of polynomial time algorithms, which makes rustc chokes at about 50 chain().

ishitatsuyuki commented 6 years ago

@hcpl I suspect you copied the wrong code for the specialized version above, as I observe the same time-passes behavior and the code lacks main() despite you specify --crate-type bin for the specialized version.

Can you check again, and if possible, provide the correct specialized version code?

ishitatsuyuki commented 6 years ago

Update: although I have fixed the normalization part, the typeck still seems to be exponential on time, and very memory consuming. I'm investigating the cause of this second issue.

ishitatsuyuki commented 6 years ago

I have implemented a fix for the typeck part too, and I believe there's no more exponential algorithms in rustc anymore.

hcpl commented 6 years ago

@ishitatsuyuki the code is correct, but both rustc invocations should have been with --crate-type=lib src/lib.rs (which were assumed to be in different crates). Sorry about that.

I've changed filenames to foo_generic.rs and foo_specialized.rs respectively to better reflect their purpose.