rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.98k stars 12.68k forks source link

Incremental causes significant regression in clean compile time for winapi #48247

Closed retep998 closed 9 months ago

retep998 commented 6 years ago

Doing cargo rustc --features everything -- -Ztime-passes with winapi 0.3.

tl;dr clean incremental is dramatically slower than clean normal, for minimal gains in rebuild times.

Times taken using rustc 1.28.0-nightly (b68432d56 2018-06-12)

Clean incremental: 34.68s Rebuild incremental: 19.59s Clean normal: 23.48s Rebuild normal: 22.73s

A clean incremental build

   Compiling winapi v0.3.5 (file:///C:/Users/Peter/Code/winapi-rs)
  time: 0.894; rss: 113MB       parsing
  time: 0.000; rss: 113MB       garbage collect incremental cache directory
  time: 0.000; rss: 113MB       recursion limit
  time: 0.000; rss: 113MB       crate injection
  time: 0.000; rss: 113MB       background load prev dep-graph
  time: 0.000; rss: 113MB       plugin loading
  time: 0.000; rss: 113MB       plugin registration
    time: 2.900; rss: 436MB     expand crate
    time: 0.000; rss: 436MB     check unused macros
  time: 2.901; rss: 436MB       expansion
  time: 0.000; rss: 436MB       maybe building test harness
  time: 0.026; rss: 436MB       maybe creating a macro crate
  time: 0.077; rss: 436MB       creating allocators
  time: 0.088; rss: 436MB       AST validation
  time: 0.367; rss: 475MB       name resolution
  time: 0.111; rss: 475MB       complete gated feature checking
  time: 0.000; rss: 475MB       blocked while dep-graph loading finishes
  time: 0.474; rss: 607MB       lowering ast -> hir
  time: 0.135; rss: 607MB       early lint checks
  time: 0.682; rss: 634MB       indexing hir
  time: 0.000; rss: 383MB       load query result cache
  time: 0.000; rss: 383MB       looking for entry point
  time: 0.003; rss: 383MB       looking for plugin registrar
  time: 0.044; rss: 383MB       loop checking
  time: 0.051; rss: 395MB       attribute checking
  time: 0.143; rss: 402MB       stability checking
  time: 1.316; rss: 580MB       type collecting
  time: 0.017; rss: 580MB       outlives testing
  time: 0.050; rss: 584MB       impl wf inference
  time: 0.780; rss: 626MB       coherence checking
  time: 0.016; rss: 626MB       variance testing
  time: 2.214; rss: 842MB       wf checking
  time: 1.902; rss: 958MB       item-types checking
  time: 2.327; rss: 1045MB      item-bodies checking
  time: 0.848; rss: 1105MB      rvalue promotion
  time: 1.125; rss: 1120MB      privacy checking
  time: 0.081; rss: 1120MB      intrinsic checking
  time: 0.269; rss: 1129MB      match checking
  time: 0.076; rss: 1129MB      liveness checking
  time: 4.956; rss: 1414MB      borrow checking
  time: 0.239; rss: 1434MB      MIR borrow checking
  time: 0.000; rss: 1434MB      dumping chalk-like clauses
  time: 0.037; rss: 1434MB      MIR effect checking
  time: 0.284; rss: 1436MB      death checking
  time: 0.000; rss: 1436MB      unused lib feature checking
  time: 3.860; rss: 1597MB      lint checking
  time: 0.000; rss: 1597MB      resolving dependency formats
      time: 0.030; rss: 1615MB  monomorphization collection
      time: 0.000; rss: 1615MB  codegen unit partitioning
    time: 3.419; rss: 1687MB    write metadata
    time: 0.000; rss: 1688MB    codegen to LLVM IR
    time: 0.000; rss: 1688MB    assert dep graph
    time: 0.000; rss: 1688MB    llvm function passes [1y16o1qfye96o7m0]
    time: 0.000; rss: 1688MB    llvm module passes [1y16o1qfye96o7m0]
    time: 0.001; rss: 1690MB    codegen passes [1y16o1qfye96o7m0]
  time: 0.006; rss: 1690MB      LLVM passes
            time: 0.017; rss: 1696MB    encode_query_results for rustc::ty::queries::type_of
            time: 0.030; rss: 1699MB    encode_query_results for rustc::ty::queries::generics_of
            time: 0.008; rss: 1701MB    encode_query_results for rustc::ty::queries::predicates_of
            time: 0.006; rss: 1702MB    encode_query_results for rustc::ty::queries::used_trait_imports
            time: 0.101; rss: 1707MB    encode_query_results for rustc::ty::queries::typeck_tables_of
            time: 0.000; rss: 1707MB    encode_query_results for rustc::ty::queries::codegen_fulfill_obligation
            time: 0.290; rss: 1722MB    encode_query_results for rustc::ty::queries::optimized_mir
            time: 0.003; rss: 1723MB    encode_query_results for rustc::ty::queries::unsafety_check_result
            time: 0.008; rss: 1724MB    encode_query_results for rustc::ty::queries::borrowck
            time: 0.003; rss: 1725MB    encode_query_results for rustc::ty::queries::mir_borrowck
            time: 0.009; rss: 1726MB    encode_query_results for rustc::ty::queries::mir_const_qualif
            time: 0.000; rss: 1726MB    encode_query_results for rustc::ty::queries::def_symbol_name
            time: 0.002; rss: 1726MB    encode_query_results for rustc::ty::queries::const_is_rvalue_promotable_to_static
            time: 0.000; rss: 1726MB    encode_query_results for rustc::ty::queries::symbol_name
            time: 0.002; rss: 1727MB    encode_query_results for rustc::ty::queries::check_match
            time: 0.001; rss: 1727MB    encode_query_results for rustc::ty::queries::codegen_fn_attrs
            time: 0.003; rss: 1728MB    encode_query_results for rustc::ty::queries::specialization_graph_of
          time: 0.494; rss: 1729MB      encode query results
        time: 0.690; rss: 1758MB        serialize query result cache
      time: 0.843; rss: 1692MB  persist query result cache
          time: 0.117; rss: 1866MB      getting serialized graph
          time: 0.448; rss: 2007MB      encoding serialized graph
        time: 0.587; rss: 1834MB        encode dep-graph
      time: 0.949; rss: 1692MB  persist dep-graph
    time: 1.792; rss: 1692MB    serialize dep graph
  time: 5.218; rss: 1692MB      codegen
  time: 0.001; rss: 659MB       serialize work products
  time: 0.064; rss: 659MB       linking
    Finished dev [unoptimized + debuginfo] target(s) in 34.68s

A clean non-incremental build (CARGO_INCREMENTAL set to 0)

   Compiling winapi v0.3.5 (file:///C:/Users/Peter/Code/winapi-rs)
  time: 0.548; rss: 113MB       parsing
  time: 0.000; rss: 113MB       recursion limit
  time: 0.000; rss: 113MB       crate injection
  time: 0.000; rss: 113MB       plugin loading
  time: 0.000; rss: 113MB       plugin registration
    time: 2.845; rss: 436MB     expand crate
    time: 0.000; rss: 436MB     check unused macros
  time: 2.846; rss: 436MB       expansion
  time: 0.000; rss: 436MB       maybe building test harness
  time: 0.028; rss: 436MB       maybe creating a macro crate
  time: 0.078; rss: 436MB       creating allocators
  time: 0.088; rss: 436MB       AST validation
  time: 0.369; rss: 474MB       name resolution
  time: 0.111; rss: 474MB       complete gated feature checking
  time: 0.463; rss: 607MB       lowering ast -> hir
  time: 0.135; rss: 607MB       early lint checks
  time: 0.485; rss: 609MB       indexing hir
  time: 0.000; rss: 357MB       load query result cache
  time: 0.000; rss: 357MB       looking for entry point
  time: 0.003; rss: 357MB       looking for plugin registrar
  time: 0.041; rss: 358MB       loop checking
  time: 0.050; rss: 360MB       attribute checking
  time: 0.124; rss: 363MB       stability checking
  time: 0.580; rss: 427MB       type collecting
  time: 0.016; rss: 427MB       outlives testing
  time: 0.026; rss: 429MB       impl wf inference
  time: 0.538; rss: 463MB       coherence checking
  time: 0.015; rss: 463MB       variance testing
  time: 1.321; rss: 603MB       wf checking
  time: 1.459; rss: 668MB       item-types checking
  time: 1.934; rss: 745MB       item-bodies checking
  time: 0.521; rss: 784MB       rvalue promotion
  time: 0.897; rss: 790MB       privacy checking
  time: 0.062; rss: 790MB       intrinsic checking
  time: 0.157; rss: 793MB       match checking
  time: 0.072; rss: 793MB       liveness checking
  time: 3.158; rss: 946MB       borrow checking
  time: 0.058; rss: 961MB       MIR borrow checking
  time: 0.000; rss: 961MB       dumping chalk-like clauses
  time: 0.025; rss: 961MB       MIR effect checking
  time: 0.247; rss: 962MB       death checking
  time: 0.000; rss: 962MB       unused lib feature checking
  time: 2.830; rss: 1011MB      lint checking
  time: 0.000; rss: 1011MB      resolving dependency formats
      time: 0.013; rss: 1024MB  monomorphization collection
      time: 0.000; rss: 1024MB  codegen unit partitioning
    time: 2.493; rss: 1068MB    write metadata
    time: 0.000; rss: 1068MB    codegen to LLVM IR
    time: 0.000; rss: 1069MB    assert dep graph
    time: 0.000; rss: 1069MB    serialize dep graph
    time: 0.000; rss: 1069MB    llvm function passes [winapi0]
  time: 2.499; rss: 1069MB      codegen
    time: 0.000; rss: 1069MB    llvm module passes [winapi0]
    time: 0.002; rss: 1056MB    codegen passes [winapi0]
  time: 0.006; rss: 1052MB      LLVM passes
  time: 0.000; rss: 190MB       serialize work products
  time: 0.066; rss: 190MB       linking
    Finished dev [unoptimized + debuginfo] target(s) in 23.48s
hanna-kruppe commented 6 years ago

cc @michaelwoerister

Mark-Simulacrum commented 6 years ago

I believe we by-default disable incremental compilation for any non-local (so, non-path, I believe) dependencies so this at least shouldn't affect downstream users of winapi. Still bad though.

michaelwoerister commented 6 years ago

Yes, Cargo builds non-local dependencies in non-incremental mode, so clients of winapi should not hit this case. This is still an interesting data point. Is the crate a single big module?

retep998 commented 6 years ago

winapi has nearly 300 modules, but of widely varying size. There are some small modules with only couple definitions, and there are some rather large modules weighing in as much as 336KiB.

michaelwoerister commented 6 years ago

Huh, why is it compiling to just one object file in incremental mode then? Do all except one module only contain #[inline] functions?

retep998 commented 6 years ago

Because all the functions are #[inline] functions (aside from extern functions which don't have associated bodies anyway), so there's literally nothing to codegen.

; ModuleID = 'winapi0-44883cb3741bc93bb95d7fd494a16a2c.rs'
source_filename = "winapi0-44883cb3741bc93bb95d7fd494a16a2c.rs"
target datalayout = "e-m:w-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc"

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!3, !4}

!0 = distinct !DICompileUnit(language: DW_LANG_Rust, file: !1, producer: "clang LLVM (rustc version 1.25.0-nightly (3d292b793 2018-02-03))", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
!1 = !DIFile(filename: "src\5Clib.rs", directory: "C:\5CUsers\5CPeter\5CCode\5Cwinapi-rs")
!2 = !{}
!3 = !{i32 2, !"CodeView", i32 1}
!4 = !{i32 2, !"Debug Info Version", i32 3}
michaelwoerister commented 6 years ago

How are re-compilation times after making a small change?

michaelwoerister commented 6 years ago

And what is the exact version of rustc used for producing the timings above?

retep998 commented 6 years ago

https://gist.github.com/retep998/83dda0b6f720bce554a32e859d09f0bb

An incremental rebuild after a null edit is 17.99 while a non-incremental rebuild after a null edit is 18.47.

rustc -vV
rustc 1.25.0-nightly (3d292b793 2018-02-03)
binary: rustc
commit-hash: 3d292b793ade0c1c9098fb32586033d79f6e9969
commit-date: 2018-02-03
host: x86_64-pc-windows-msvc
release: 1.25.0-nightly
LLVM version: 4.0
michaelwoerister commented 6 years ago

An incremental rebuild after a null edit is 17.99

Wow! That's certainly not good. Is there a way to compile winapi on Linux so that we could add it as a benchmark to perf.rust-lang.org?

retep998 commented 6 years ago

Of course. Just acquire the relevant x86_64-pc-windows-msvc std component using rustup and then build winapi for that target. It's a trivial rlib so there's no cross C toolchain hassle to deal with. And don't forget to specify --features everything.

michaelwoerister commented 6 years ago

@Mark-Simulacrum, we should look into adding winapi as a benchmark. It seems to produce a workload that is quite different from most other crates.

Mark-Simulacrum commented 6 years ago

Yes, I've wanted to do it for a long time (as has @retep998) but have never quite found the time. Hopefully soon!

retep998 commented 4 years ago

Status update, this is still an issue. Using latest nightly.

Incremental Debug
   Compiling winapi v0.3.8 (D:\a\winapi-rs\winapi-rs)
    Finished dev [unoptimized + debuginfo] target(s) in 50.73s
Non-incremental Debug
    Compiling winapi v0.3.8 (D:\a\winapi-rs\winapi-rs)
    Finished dev [unoptimized + debuginfo] target(s) in 36.30s
Release
   Compiling winapi v0.3.8 (D:\a\winapi-rs\winapi-rs)
    Finished release [optimized] target(s) in 36.60s
Mark-Simulacrum commented 9 months ago

With rustc 1.76.0-beta.3 (3a3478ec4 2024-01-10):

From scratch builds:

Incremental re-compile after touch src/lib.rs takes: 7.2s

So seems like incremental still makes things slower (or isn't a win at all), but we're significantly faster and/or I'm using better hardware (Ryzen 3950x). After the first build it's only 200ms slower on incremental re-compiles, which to me indicates that the underlying problem here is fixed - we expect the incremental, but no cache, scenario to be slower and this is represented in ~all perf.r-l.o benchmarks.

Closing as fixed.