rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.98k stars 12.68k forks source link

Slow compilation with extremely borrowed type #103195

Closed jruderman closed 2 years ago

jruderman commented 2 years ago

This fairly small Rust program takes 5–10 seconds to compile (playground):

fn f(
    _: &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&(),
) -> u32 {
    3
}

fn main() {
    println!("{}", f(&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&()));
}

Found with fuzz-rustc.

Timings

The passes taking the most time are:

Output with `-Z time-passes`

``` time: 0.012; rss: 21MB -> 22MB ( +1MB) parse_crate time: 0.000; rss: 22MB -> 22MB ( +0MB) attributes_injection time: 0.000; rss: 22MB -> 22MB ( +0MB) plugin_loading time: 0.003; rss: 23MB -> 23MB ( +0MB) crate_injection time: 0.097; rss: 23MB -> 27MB ( +4MB) expand_crate time: 0.097; rss: 23MB -> 27MB ( +4MB) macro_expand_crate time: 0.000; rss: 27MB -> 27MB ( +0MB) maybe_building_test_harness time: 0.002; rss: 27MB -> 27MB ( +0MB) AST_validation time: 0.000; rss: 27MB -> 27MB ( +0MB) maybe_create_a_macro_crate time: 0.000; rss: 27MB -> 27MB ( +0MB) finalize_imports time: 0.001; rss: 27MB -> 27MB ( +0MB) resolve_access_levels time: 0.024; rss: 27MB -> 27MB ( +1MB) finalize_macro_resolutions time: 0.038; rss: 27MB -> 29MB ( +2MB) late_resolve_crate time: 0.000; rss: 29MB -> 29MB ( +0MB) resolve_check_unused time: 0.000; rss: 29MB -> 29MB ( +0MB) resolve_report_errors time: 0.000; rss: 29MB -> 29MB ( +0MB) resolve_postprocess time: 0.064; rss: 27MB -> 29MB ( +2MB) resolve_crate time: 0.000; rss: 29MB -> 29MB ( +0MB) complete_gated_feature_checking time: 0.002; rss: 29MB -> 29MB ( +0MB) early_lint_checks time: 0.178; rss: 22MB -> 29MB ( +7MB) configure_and_expand time: 0.000; rss: 29MB -> 29MB ( +0MB) prepare_outputs time: 0.003; rss: 29MB -> 30MB ( +0MB) setup_global_ctxt time: 0.000; rss: 30MB -> 30MB ( +0MB) drop_ast time: 0.018; rss: 30MB -> 31MB ( +1MB) looking_for_entry_point time: 0.001; rss: 31MB -> 31MB ( +0MB) looking_for_derive_registrar time: 0.000; rss: 31MB -> 31MB ( +0MB) unused_lib_feature_checking time: 0.039; rss: 30MB -> 31MB ( +2MB) misc_checking_1 time: 0.018; rss: 31MB -> 32MB ( +1MB) type_collecting time: 0.001; rss: 32MB -> 32MB ( +0MB) impl_wf_inference time: 0.001; rss: 32MB -> 32MB ( +0MB) coherence_checking time: 2.541; rss: 32MB -> 43MB ( +11MB) wf_checking time: 0.000; rss: 43MB -> 43MB ( +0MB) item_types_checking time: 0.052; rss: 43MB -> 45MB ( +2MB) item_bodies_checking time: 2.615; rss: 31MB -> 45MB ( +14MB) type_check_crate time: 0.004; rss: 45MB -> 45MB ( +0MB) match_checking time: 0.002; rss: 45MB -> 45MB ( +0MB) liveness_checking time: 0.006; rss: 45MB -> 45MB ( +0MB) misc_checking_2 time: 7.194; rss: 45MB -> 68MB ( +23MB) MIR_borrow_checking time: 0.000; rss: 68MB -> 68MB ( +0MB) MIR_effect_checking time: 0.002; rss: 69MB -> 69MB ( +0MB) crate_lints time: 0.007; rss: 69MB -> 69MB ( +0MB) module_lints time: 0.009; rss: 69MB -> 69MB ( +0MB) lint_checking time: 0.030; rss: 69MB -> 69MB ( +0MB) privacy_checking_modules time: 0.002; rss: 69MB -> 69MB ( +0MB) check_lint_expectations time: 0.047; rss: 68MB -> 69MB ( +0MB) misc_checking_3 time: 0.011; rss: 69MB -> 69MB ( +0MB) monomorphization_collector_root_collections time: 0.130; rss: 69MB -> 69MB ( +0MB) monomorphization_collector_graph_walk time: 0.008; rss: 69MB -> 70MB ( +0MB) partition_and_assert_distinct_symbols time: 0.009; rss: 70MB -> 71MB ( +0MB) write_allocator_module time: 0.000; rss: 71MB -> 71MB ( +0MB) find_cgu_reuse time: 0.050; rss: 71MB -> 77MB ( +6MB) codegen_to_LLVM_IR time: 0.239; rss: 69MB -> 77MB ( +9MB) codegen_crate time: 0.037; rss: 73MB -> 77MB ( +5MB) LLVM_passes time: 0.000; rss: 77MB -> 77MB ( +0MB) serialize_dep_graph time: 0.005; rss: 77MB -> 62MB ( -14MB) free_global_ctxt time: 0.004; rss: 62MB -> 62MB ( +0MB) finish_ongoing_codegen time: 0.001; rss: 62MB -> 62MB ( +0MB) serialize_work_products time: 0.613; rss: 63MB -> 63MB ( +0MB) run_linker time: 0.623; rss: 62MB -> 63MB ( +1MB) link_binary time: 0.624; rss: 62MB -> 63MB ( +1MB) link_crate time: 0.631; rss: 62MB -> 63MB ( +1MB) link time: 11.069; rss: 13MB -> 44MB ( +31MB) total ```

Meta

rustc --version --verbose:

rustc 1.66.0-nightly (b8b5caee0 2022-10-16)
binary: rustc
commit-hash: b8b5caee04116c7383eb1c6470fcf15c437a60d4
commit-date: 2022-10-16
host: x86_64-apple-darwin
release: 1.66.0-nightly
LLVM version: 15.0.2
Noratrieb commented 2 years ago

perf indicates that most of the time is spent during borrowck in TransitiveRelationBuilder::add from rustc_data_structures. https://github.com/rust-lang/rust/blob/a24a020e6d926dffe6b472fc647978f92269504e/compiler/rustc_data_structures/src/transitive_relation.rs#L98-L105

Specifically, the contains on the Vec<T> seems to take up a significant amount of time:

 44.00 │40:   test %rcx,%rcx                                                                                                                               
       │    ↓ je   5d                                                                                                                                      
 16.00 │      mov  %rax,%rdx                                                                                                                               
 12.00 │      add  $0x10,%rax                                                                                                                              
  8.00 │      add  $0xfffffffffffffff0,%rcx                                                                                                                
 20.00 │      cmp  %rbx,(%rdx)
       │    ↑ jne  40

And indeed, the Vec<T> gets quite big, up to 49455 here, as a few debug prints show.

Noratrieb commented 2 years ago

Doubling the amount of nested references makes it go up to 196251, so this is indeed quadratic.

Noratrieb commented 2 years ago

@SparrowLii since you've recently touched that code

Noratrieb commented 2 years ago

This is now fixed, meaning the slowness is gone and it's reasonably fast now. I'm not sure whether we want to keep the issue though, as there seems to be some quadraticness in borrowck that caused this - using a better datastructure helped mitigate it, but the quadraticness is still there - whether it matters is another question.

compiler-errors commented 2 years ago

@Nilstrieb if this is fixed, then I'll probably close this. If we're tracking other quadraticness in the borrow checker, then we should both have examples that trigger such quadraticness and also open separate issues, since their fixes will likely be similar but distinct.