rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.38k stars 12.72k forks source link

Exponential compile time for linear amount of "simple" trait bounds #106498

Open simon-auch opened 1 year ago

simon-auch commented 1 year ago

I tried this code:

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=118ed704fee0b6c3337d9ba7ee38defb

I expected to see this happen: The code should compile successfully, ideally with all the trait bounds or at least more than the current maximum for the playground (from my testing it was killed when going from 8 to 9). I would especially expect a linear runtime in this particular case since all the trait bounds can "trivially" be checked on their own.

Disclaimer: I could not reproduce this without the feature generic_const_exprs enabled.

Instead, this happened: With 9 bounds on the trait the playground kills the compiler, with all of the trait bounds the compiler reaches a maximum recursion depth.

Meta

Nightly version of the playground:

Build using the Nightly version: 1.68.0-nightly

(2023-01-03 c7572670a1302f5c7e24)

compiler-errors commented 1 year ago

The linked code seems wrong.

clubby789 commented 1 year ago

To share Playground code, click 'share' then copy the link under 'permalink to the playground', rather than copying from the address bar.

simon-auch commented 1 year ago

Link fixed in the description. Sorry for the inconvenience.

simon-auch commented 1 year ago

I just tested the compile time on my laptop for an increasing set of bounds: image The vertical axis is logarithmic For eleven bounds it took 1024s!

I will update the title of the post to indicate the exponential behavior observed.

simon-auch commented 1 year ago

Timing output with eleven bounds:

time:   0.000; rss:   47MB ->   52MB (   +4MB)  incr_comp_prepare_session_directory
time:   0.002; rss:   54MB ->   68MB (  +15MB)  expand_crate
time:   0.002; rss:   54MB ->   68MB (  +15MB)  macro_expand_crate
time:   0.000; rss:   68MB ->   73MB (   +4MB)  finalize_macro_resolutions
time:   0.001; rss:   68MB ->   73MB (   +4MB)  resolve_crate
time:   0.003; rss:   54MB ->   73MB (  +19MB)  configure_and_expand
time:   0.000; rss:   73MB ->   77MB (   +4MB)  looking_for_derive_registrar
time:   0.001; rss:   73MB ->   77MB (   +4MB)  misc_checking_1
time:   0.001; rss:   77MB ->   81MB (   +4MB)  type_collecting
time:   0.003; rss:   81MB ->   91MB (  +10MB)  wf_checking
time: 1024.595; rss:   91MB ->   99MB (   +7MB) item_types_checking
time: 1024.600; rss:   77MB ->   99MB (  +22MB) type_check_crate
time:   0.003; rss:   99MB ->  116MB (  +17MB)  crate_lints
time:   0.004; rss:   99MB ->  116MB (  +17MB)  lint_checking
time:   0.004; rss:   99MB ->  116MB (  +17MB)  misc_checking_3
time:   0.000; rss:  116MB ->  135MB (  +19MB)  codegen_to_LLVM_IR
time:   0.001; rss:  116MB ->  135MB (  +19MB)  codegen_crate
time:   0.000; rss:  135MB ->  137MB (   +2MB)  encode_query_results_for(rustc_query_impl::queries::mir_for_ctfe)
time:   0.000; rss:  137MB ->  139MB (   +2MB)  encode_query_results_for(rustc_query_impl::queries::symbol_name)
time:   0.001; rss:  135MB ->  139MB (   +4MB)  encode_query_results
time:   0.001; rss:  135MB ->  141MB (   +6MB)  incr_comp_serialize_result_cache
time:   0.001; rss:  135MB ->  141MB (   +6MB)  incr_comp_persist_result_cache
time:   0.001; rss:  135MB ->  141MB (   +6MB)  serialize_dep_graph
time:   0.002; rss:  133MB ->  118MB (  -15MB)  LLVM_passes
time:   0.001; rss:  141MB ->  101MB (  -40MB)  free_global_ctxt
time:   0.000; rss:  101MB ->  101MB (   +0MB)  link_rlib
time:   0.000; rss:  101MB ->  101MB (   +0MB)  link_binary
time:   0.000; rss:  101MB ->  101MB (   +0MB)  link_crate
time:   0.001; rss:  101MB ->  101MB (   +0MB)  link
time: 1024.619; rss:   34MB ->   96MB (  +62MB) total
simon-auch commented 1 year ago

Boiled the code down to the following:

#![feature(generic_const_exprs)]
trait Foo {}

impl Foo for () where Bar<0>: {}

struct Bar<const T: usize>
where
    [(); T + 0 ]:,
    [(); T + 0 ]:,
    [(); T + 0 ]:,
    [(); T + 0 ]:,
    [(); T + 0 ]:,
    [(); T + 0 ]:,
    [(); T + 0 ]:,
    [(); T + 0 ]:,
    [(); T + 0 ]:, {}

Still shows the same exponential behavior, interestingly the time spent before was in item_types_checking but with the new version is in wf_checking. This happened after removing some of the trait/struct indirection present before.

Timings of this version for 9 bounds:

time:   0.000; rss:   54MB ->   57MB (   +3MB)  parse_crate
time:   0.002; rss:   63MB ->   77MB (  +13MB)  expand_crate
time:   0.002; rss:   63MB ->   77MB (  +13MB)  macro_expand_crate
time:   0.000; rss:   77MB ->   81MB (   +4MB)  finalize_macro_resolutions
time:   0.000; rss:   77MB ->   81MB (   +4MB)  resolve_crate
time:   0.003; rss:   59MB ->   81MB (  +22MB)  configure_and_expand
time:   0.001; rss:   81MB ->   85MB (   +4MB)  misc_checking_1
time:   0.001; rss:   85MB ->   90MB (   +4MB)  type_collecting
time:   7.556; rss:   90MB ->  104MB (  +14MB)  wf_checking
time:   0.001; rss:  104MB ->  108MB (   +4MB)  item_types_checking
time:   7.557; rss:   85MB ->  108MB (  +23MB)  type_check_crate
time:   0.004; rss:  110MB ->  126MB (  +16MB)  crate_lints
time:   0.004; rss:  110MB ->  126MB (  +16MB)  lint_checking
time:   0.005; rss:  108MB ->  126MB (  +18MB)  misc_checking_3
time:   0.001; rss:  126MB ->  131MB (   +4MB)  generate_crate_metadata
time:   0.000; rss:  133MB ->  137MB (   +4MB)  codegen_to_LLVM_IR
time:   0.001; rss:  131MB ->  137MB (   +6MB)  codegen_crate
time:   0.000; rss:  139MB ->  141MB (   +2MB)  encode_query_results_for(rustc_query_impl::queries::def_span)
time:   0.001; rss:  137MB ->  141MB (   +4MB)  encode_query_results
time:   0.001; rss:  137MB ->  141MB (   +4MB)  incr_comp_serialize_result_cache
time:   0.001; rss:  137MB ->  141MB (   +4MB)  incr_comp_persist_result_cache
time:   0.001; rss:  137MB ->  141MB (   +4MB)  serialize_dep_graph
time:   0.002; rss:  133MB ->  120MB (  -13MB)  LLVM_passes
time:   0.001; rss:  141MB ->   99MB (  -43MB)  free_global_ctxt
time:   0.000; rss:   99MB ->   99MB (   +0MB)  link_rlib
time:   0.000; rss:   99MB ->   99MB (   +0MB)  link_binary
time:   0.000; rss:   99MB ->   99MB (   +0MB)  link_crate
time:   0.001; rss:   99MB ->   99MB (   +0MB)  link
time:   7.576; rss:   43MB ->   94MB (  +50MB)  total
simon-auch commented 1 year ago

I had a look into the compiler to try and understand what is going on in this case. I am missing general knowledge about how the wf checks in the compiler work to really say where specifically the issue lies.

So far my understanding what happens:

Since the obligations for any [(); T + 0 ] are all the obligations for the trait itself this results on its own in a linear recursion depth (this is fine I guess). To estimate the asymptotic runtime it is important to the following. Given the first call to evaluate_predicates_recursively with n obligations of form [(); T + 0 ] of which none is on the stack each of the n obligations is checked separately (using their own stack) -> no results from checking the previous obligations are reused -> this is basically a factor of O(n) This results I think in a recursive runtime for which the following function would be a lower bound (i represents the number of unchecked obligations, n is the total number of obligations:

T(n) = H(n,n)
H(n,0) = 1
H(n,i) = n * ((n-i)/2 + H(n, i - 1))

I already merged all calls that check all obligations passed to each evaluate_predicates_recursively call (the factor n) and (n-i)/2 represents the average stack size that is searched for already checked obligations. We can simplify this by just ignoring the search on the stack and it boils down to n!.

Please correct me if I made a mistake here.

Two things that are still unclear to me:

simon-auch commented 1 year ago

A workaround is to refactor the code so that the const is computed as an associated const of a trait: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=8ac4727f7e41741e206e4a12f2bd6c9d

This pattern does not have exponential compile time but also does not "bubble up" the requirements on the const generics so that the associated const is valid. Another disadvantage is that AnyP in this example cannot be used in a generic context without additional where bounds that assert the associated constants are wf.

simon-auch commented 1 year ago

Continuing with the slow example.

Debugging around and looking through lots of logs I found the following:

wf::obligations(Bar<0>, body_id=HirId(DefId(0:0 ~ lib[b3d6]).0)) =
[
    Obligation(predicate=Binder(WellFormed([(); _]), []), depth=2),
    Obligation(predicate=Binder(WellFormed([(); _]), []), depth=2),
    Obligation(predicate=Binder(WellFormed([(); _]), []), depth=2),
    Obligation(predicate=Binder(WellFormed([(); _]), []), depth=2),
    Obligation(predicate=Binder(WellFormed([(); _]), []), depth=2),
    Obligation(predicate=Binder(WellFormed([(); _]), []), depth=2),
    Obligation(predicate=Binder(WellFormed([(); _]), []), depth=2),
    Obligation(predicate=Binder(WellFormed([(); _]), []), depth=2),
    Obligation(predicate=Binder(WellFormed([(); _]), []), depth=2),
    Obligation(
        predicate=Binder(
            ConstEvaluatable(
                Const {
                    ty: usize,
                    kind: Unevaluated(
                        UnevaluatedConst {
                            def: WithOptConstParam {
                                did: DefId(0:8 ~ lib[b3d6]::Bar::{constant#0}),
                                const_param_did: None
                            },
                            substs: [Const { ty: usize, kind: Value(Leaf(0x0000000000000000)) }]
                        }
                    )
                }
            ),
            []
        ),
        depth=2
    ),
    Obligation(predicate=Binder(ConstEvaluatable(Const { ty: usize, kind: Unevaluated(UnevaluatedConst { def: WithOptConstParam { did: DefId(0:9  ~ lib[b3d6]::Bar::{constant#1}), const_param_did: None }, substs: [Const { ty: usize, kind: Value(Leaf(0x0000000000000000)) }] }) }), []), depth=2),
    Obligation(predicate=Binder(ConstEvaluatable(Const { ty: usize, kind: Unevaluated(UnevaluatedConst { def: WithOptConstParam { did: DefId(0:10 ~ lib[b3d6]::Bar::{constant#2}), const_param_did: None }, substs: [Const { ty: usize, kind: Value(Leaf(0x0000000000000000)) }] }) }), []), depth=2),
    Obligation(predicate=Binder(ConstEvaluatable(Const { ty: usize, kind: Unevaluated(UnevaluatedConst { def: WithOptConstParam { did: DefId(0:11 ~ lib[b3d6]::Bar::{constant#3}), const_param_did: None }, substs: [Const { ty: usize, kind: Value(Leaf(0x0000000000000000)) }] }) }), []), depth=2),
    Obligation(predicate=Binder(ConstEvaluatable(Const { ty: usize, kind: Unevaluated(UnevaluatedConst { def: WithOptConstParam { did: DefId(0:12 ~ lib[b3d6]::Bar::{constant#4}), const_param_did: None }, substs: [Const { ty: usize, kind: Value(Leaf(0x0000000000000000)) }] }) }), []), depth=2),
    Obligation(predicate=Binder(ConstEvaluatable(Const { ty: usize, kind: Unevaluated(UnevaluatedConst { def: WithOptConstParam { did: DefId(0:13 ~ lib[b3d6]::Bar::{constant#5}), const_param_did: None }, substs: [Const { ty: usize, kind: Value(Leaf(0x0000000000000000)) }] }) }), []), depth=2),
    Obligation(predicate=Binder(ConstEvaluatable(Const { ty: usize, kind: Unevaluated(UnevaluatedConst { def: WithOptConstParam { did: DefId(0:14 ~ lib[b3d6]::Bar::{constant#6}), const_param_did: None }, substs: [Const { ty: usize, kind: Value(Leaf(0x0000000000000000)) }] }) }), []), depth=2),
    Obligation(predicate=Binder(ConstEvaluatable(Const { ty: usize, kind: Unevaluated(UnevaluatedConst { def: WithOptConstParam { did: DefId(0:15 ~ lib[b3d6]::Bar::{constant#7}), const_param_did: None }, substs: [Const { ty: usize, kind: Value(Leaf(0x0000000000000000)) }] }) }), []), depth=2),
    Obligation(predicate=Binder(ConstEvaluatable(Const { ty: usize, kind: Unevaluated(UnevaluatedConst { def: WithOptConstParam { did: DefId(0:16 ~ lib[b3d6]::Bar::{constant#8}), const_param_did: None }, substs: [Const { ty: usize, kind: Value(Leaf(0x0000000000000000)) }] }) }), []), depth=2)]

(expanded one line from the second block for readability)

The first block of lines represents the bounds [(); T + 0] while the second block represents the bounds T + 0. I noticed from these lines that (as far as I understand) the second block contains the substitution T -> 0 while the first block is missing the substitution T -> 0.

Together with the behavior observed before, this leads me to believe the following is happening:

Note 1

I believe it is Bar<0> and not Bar<T> as logs with a deeper recursion level still contain the substitutions of T -> 0 for the second block of obligations. I'm not sure however where the information T -> 0 comes from again, but I think it actually request all obligations of the parent of T which happens to be Bar<0>.

Note 2

One might assume the same issue would already appear for bounds with with form [(); T], but according to the logs these do actually get the substitution:

Binder(WellFormed([(); 0]), []),

vs

Binder(WellFormed([(); _]), []),

Current assumption

My current assumption is that the obligation generated for each [(); T + 0] bound is missing the substitution T -> 0. If this substitution is added this issue might be resolved. I will try to continue understanding the workings of the compiler and whip up a fix.

Edit

On second though: the substitution might actually work, since for [(); T] the result is [(); 0] which also does not contain substitutions but instead has the substitutions already applied. This could also be the case for [(); T + 0] which could already result in [(); 0 + 0]. The question that arises then is, why does this result in all obligations of Bar<0> to be evaluated again.

simon-auch commented 1 year ago

@lcnr, @eddyb maybe you could help me to understand something:

In this commit you implemented wf checking for constants which includes nominal_obligations when computing all obligations required for a ty::ConstKind::Unevaluated.

In this particular case the nominal_obligations look very redundant and I assume this to be the core of the problem. Can you provide an example why this is necessary? Should it not be sufficient to check for ConstEvaluatable(ct) at this point? I would have assumed that any obligations returned by nominal_obligations should already have been handled when wf checking the parent of ct?

In your PR you also had an open question about whether or not to already evaluate the constant instead of returning an obligation for it (https://github.com/rust-lang/rust/pull/70107/files#r430472199).

If you would provide the necessary guidance on this issue I would be happy to prepare a PR for it :)

Edit

Just checked if, and if yes, which tests break when removing the nominal_obligations:

For the first issue I would assume this change to be okay, since we still have the same error three times. For the second issue I'm not entirely sure, but since the same test also checks error: genericSelftypes are currently not permitted in anonymous constants this might be fine?

simon-auch commented 1 year ago

I could imagine that the nominal_obligations become relevant when the constant refers to some arbitrary other type, something like the following:

struct Foo<const A: usize> where 
    [(); Bar<A>::T]:,
 {}

But I would assume the necessary obligation to be generated by the walker in the wf::compute function while walking the ConstKind::Unevaluated?

workingjubilee commented 1 year ago

Unfortunately it is to-be-expected that the implementation of F-generic_const_exprs, an unfinished feature, is really bad with respect to performance. The first priority is correctness, and that priority is absolute, here.