rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.61k stars 12.62k forks source link

Exponential compile times for nested struct with associated type and generics #105900

Open alansartorio opened 1 year ago

alansartorio commented 1 year ago

I tried this code:

use std::marker::PhantomData;

pub trait Wrappable<I>: Sized {
    type Type;
    fn wrap(self) -> Wrapper<Self, Self::Type> {
        Wrapper(self, PhantomData)
    }
}

pub struct Wrapper<A, E>(A, PhantomData<E>);
impl<I, E, A: Wrappable<I, Type = E>> Wrappable<I> for Wrapper<A, E> {
    type Type = E;
}

impl Wrappable<u8> for u8 {
    type Type = u8;
}

pub fn function() {
    0u8
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap()
        .wrap();
}

And for each wrap I add, compilation times and memory usage double unexpectedly.

When I run RUSTFLAGS="-Z time-passes" cargo check I get this:

    Blocking waiting for file lock on build directory
    Checking slow_compile v0.1.0 (/####/slow_compile)
time:   0.000; rss:   36MB ->   39MB (   +3MB)        parse_crate
time:   0.001; rss:   41MB ->   58MB (  +16MB)        expand_crate
time:   0.001; rss:   41MB ->   58MB (  +16MB)        macro_expand_crate
time:   0.000; rss:   58MB ->   61MB (   +3MB)        late_resolve_crate
time:   0.000; rss:   58MB ->   61MB (   +3MB)        resolve_crate
time:   0.002; rss:   41MB ->   61MB (  +19MB)        configure_and_expand
time:   0.000; rss:   62MB ->   65MB (   +3MB)        misc_checking_1
time:   0.000; rss:   65MB ->   69MB (   +4MB)        type_collecting
time:  13.023; rss:   69MB -> 2788MB (+2719MB)        item_bodies_checking
time:  13.023; rss:   65MB -> 2788MB (+2722MB)        type_check_crate
time:   0.002; rss: 2788MB -> 2790MB (   +3MB)        MIR_borrow_checking
time:   0.004; rss: 2790MB -> 2811MB (  +21MB)        crate_lints
time:   0.004; rss: 2790MB -> 2811MB (  +21MB)        lint_checking
time:   0.004; rss: 2790MB -> 2811MB (  +21MB)        misc_checking_3
time:   0.046; rss: 2811MB -> 2811MB (   +0MB)        codegen_crate
time:   0.001; rss: 2811MB -> 2815MB (   +4MB)        incr_comp_persist_result_cache
time:   0.001; rss: 2811MB -> 2815MB (   +4MB)        serialize_dep_graph
time:   0.042; rss: 2815MB -> 1809MB (-1006MB)        free_global_ctxt
time:  13.200; rss:   28MB ->   95MB (  +67MB)        total
    Finished dev [unoptimized + debuginfo] target(s) in 13.86s

Here's a playground where you can reproduce it: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=db81287cfac9166b22476eb739703812

I got this code by reducing a parser I was writing using the chumsky crate:

use chumsky::prelude::*;

pub fn parser() -> impl Parser<char, (), Error = Simple<char>> {
    // Time to compile gets quicker when I replace line 7 with:
    //just::<char, _, _>("a")

    just("a")
        .then(just("a"))
        .then(just("a"))
        .then(just("a"))
        .then(just("a"))
        .then(just("a"))
        .then(just("a"))
        .then(just("a"))
        .then(just("a"))
        .then(just("a"))
        .then(just("a"))
        .then(just("a"))
        .then(just("a"))
        // Each call I add increases cargo check time by a factor of x10
        //  and memory usage by x2
        //.then(just("a"))
        .ignored()
}

I checked some similar issues but I couldn't find the same problem I have:

Meta

rustc --version --verbose:

rustc 1.68.0-nightly (d0dc9efff 2022-12-18)
binary: rustc
commit-hash: d0dc9efff14ac0a1eeceffd1e605e37eeb8362a0
commit-date: 2022-12-18
host: x86_64-unknown-linux-gnu
release: 1.68.0-nightly
LLVM version: 15.0.6

jruderman commented 1 year ago

@rustbot label +I-compiletime

Shfty commented 1 year ago

I'm seeing similar when using a lazy state monad implemented at the type level, where each step in the binding chain introduces multiple layers of nesting to the final thunk struct.

This quickly adds up: The pure functional equivalent of binding 3 ZST-tagged variables, feeding them to a function, and storing back its output compiles in ~0.5sec, but balloons up to ~150sec with a chain of twice that length.