rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.62k stars 12.48k forks source link

Performance analysis on `aws-sdk-ec2` #127634

Open joshtriplett opened 1 month ago

joshtriplett commented 1 month ago

I'm doing some performance analysis on the (long) compilation of the crate aws-sdk-ec2, and using this issue to take notes about it.

rustc +nightly --version: rustc 1.81.0-nightly (0c81f94b9 2024-07-10)

cargo +nightly rustc -- -Ztime-passes:

time:   0.000; rss:   47MB ->   48MB (   +0MB)  parse_crate
time:   0.000; rss:   49MB ->   49MB (   +1MB)  setup_global_ctxt
time:   0.000; rss:   51MB ->   51MB (   +0MB)  crate_injection
time:   1.680; rss:   51MB -> 1103MB (+1052MB)  expand_crate
time:   1.680; rss:   51MB -> 1103MB (+1052MB)  macro_expand_crate
time:   0.000; rss: 1103MB -> 1104MB (   +0MB)  maybe_building_test_harness
time:   0.052; rss: 1104MB -> 1103MB (   +0MB)  AST_validation
time:   0.007; rss: 1103MB -> 1106MB (   +3MB)  finalize_imports
time:   0.033; rss: 1106MB -> 1106MB (   +0MB)  compute_effective_visibilities
time:   0.034; rss: 1106MB -> 1108MB (   +2MB)  finalize_macro_resolutions
time:   0.969; rss: 1108MB -> 1366MB ( +257MB)  late_resolve_crate
time:   0.048; rss: 1366MB -> 1397MB (  +32MB)  resolve_check_unused
time:   0.064; rss: 1397MB -> 1397MB (   +0MB)  resolve_postprocess
time:   1.157; rss: 1103MB -> 1397MB ( +294MB)  resolve_crate
time:   0.043; rss: 1398MB -> 1398MB (   +0MB)  write_dep_info
time:   0.044; rss: 1398MB -> 1399MB (   +1MB)  complete_gated_feature_checking
time:   0.103; rss: 2151MB -> 2145MB (   -6MB)  drop_ast
time:   3.029; rss: 1398MB -> 1991MB ( +593MB)  looking_for_derive_registrar
time:   3.577; rss: 1398MB -> 2034MB ( +636MB)  misc_checking_1
time:   9.258; rss: 2034MB -> 2694MB ( +660MB)  coherence_checking
time:  18.136; rss: 2034MB -> 3353MB (+1319MB)  type_check_crate
time:  14.660; rss: 3353MB -> 4533MB (+1181MB)  MIR_borrow_checking
time:   4.442; rss: 4533MB -> 4807MB ( +273MB)  MIR_effect_checking
time:   0.421; rss: 4843MB -> 4845MB (   +2MB)  module_lints
time:   0.421; rss: 4843MB -> 4845MB (   +2MB)  lint_checking
time:   0.453; rss: 4845MB -> 4845MB (   +0MB)  privacy_checking_modules
time:   0.135; rss: 4845MB -> 4849MB (   +4MB)  check_lint_expectations
time:   1.725; rss: 4811MB -> 4849MB (  +38MB)  misc_checking_3
time:   0.044; rss: 5120MB -> 5120MB (   +0MB)  monomorphization_collector_root_collections
time:   6.002; rss: 5120MB -> 5595MB ( +476MB)  monomorphization_collector_graph_walk
time:   2.506; rss: 5589MB -> 5702MB ( +113MB)  partition_and_assert_distinct_symbols
time:  10.513; rss: 4849MB -> 5546MB ( +697MB)  generate_crate_metadata
time:  28.231; rss: 5554MB -> 6196MB ( +642MB)  codegen_to_LLVM_IR
time:  28.440; rss: 5546MB -> 6196MB ( +649MB)  codegen_crate
time:  26.305; rss: 5947MB -> 6209MB ( +261MB)  LLVM_passes
time:   0.865; rss: 6196MB -> 6257MB (  +61MB)  encode_query_results
time:   0.967; rss: 6196MB -> 6167MB (  -28MB)  incr_comp_serialize_result_cache
time:   0.967; rss: 6196MB -> 6167MB (  -28MB)  incr_comp_persist_result_cache
time:   0.968; rss: 6196MB -> 6167MB (  -28MB)  serialize_dep_graph
time:   0.298; rss: 6167MB -> 4211MB (-1957MB)  free_global_ctxt
time:   0.520; rss: 3789MB -> 3893MB ( +104MB)  link_rlib
time:   0.527; rss: 3789MB -> 3893MB ( +104MB)  link_binary
time:   0.544; rss: 3789MB -> 3484MB ( -305MB)  link_crate
time:   0.546; rss: 4211MB -> 3484MB ( -727MB)  link
time:  86.838; rss:   32MB ->  245MB ( +213MB)  total
Optimized version, for reference `cargo +nightly rustc -- -C opt-level=3 -Ztime-passes`: ``` time: 0.000; rss: 47MB -> 47MB ( +0MB) parse_crate time: 0.000; rss: 48MB -> 49MB ( +1MB) setup_global_ctxt time: 0.000; rss: 51MB -> 51MB ( +0MB) crate_injection time: 1.788; rss: 51MB -> 1104MB (+1052MB) expand_crate time: 1.788; rss: 51MB -> 1104MB (+1053MB) macro_expand_crate time: 0.000; rss: 1104MB -> 1104MB ( +0MB) maybe_building_test_harness time: 0.048; rss: 1104MB -> 1104MB ( +0MB) AST_validation time: 0.006; rss: 1104MB -> 1106MB ( +2MB) finalize_imports time: 0.027; rss: 1106MB -> 1106MB ( +0MB) compute_effective_visibilities time: 0.034; rss: 1106MB -> 1108MB ( +2MB) finalize_macro_resolutions time: 0.975; rss: 1108MB -> 1366MB ( +258MB) late_resolve_crate time: 0.041; rss: 1366MB -> 1398MB ( +32MB) resolve_check_unused time: 0.059; rss: 1398MB -> 1398MB ( +0MB) resolve_postprocess time: 1.142; rss: 1104MB -> 1398MB ( +294MB) resolve_crate time: 0.032; rss: 1400MB -> 1399MB ( -1MB) write_dep_info time: 0.030; rss: 1399MB -> 1399MB ( +1MB) complete_gated_feature_checking time: 0.103; rss: 2153MB -> 2148MB ( -6MB) drop_ast time: 3.136; rss: 1399MB -> 1994MB ( +595MB) looking_for_derive_registrar time: 3.674; rss: 1399MB -> 2036MB ( +638MB) misc_checking_1 time: 9.398; rss: 2036MB -> 2696MB ( +660MB) coherence_checking time: 18.398; rss: 2036MB -> 3354MB (+1318MB) type_check_crate time: 14.451; rss: 3354MB -> 4534MB (+1180MB) MIR_borrow_checking time: 4.738; rss: 4534MB -> 4803MB ( +269MB) MIR_effect_checking time: 0.421; rss: 4835MB -> 4840MB ( +4MB) module_lints time: 0.421; rss: 4835MB -> 4840MB ( +4MB) lint_checking time: 0.454; rss: 4840MB -> 4840MB ( +0MB) privacy_checking_modules time: 0.136; rss: 4840MB -> 4844MB ( +4MB) check_lint_expectations time: 1.728; rss: 4807MB -> 4844MB ( +37MB) misc_checking_3 time: 2.511; rss: 4844MB -> 5017MB ( +173MB) generate_crate_metadata time: 0.049; rss: 5017MB -> 5017MB ( +0MB) monomorphization_collector_root_collections time: 8.685; rss: 5017MB -> 5424MB ( +406MB) monomorphization_collector_graph_walk time: 4.531; rss: 5417MB -> 5539MB ( +122MB) partition_and_assert_distinct_symbols time: 53.323; rss: 5467MB -> 7019MB (+1553MB) codegen_to_LLVM_IR time: 66.963; rss: 5017MB -> 7019MB (+2002MB) codegen_crate time: 1.174; rss: 7019MB -> 7138MB ( +119MB) encode_query_results time: 1.298; rss: 7019MB -> 7071MB ( +52MB) incr_comp_serialize_result_cache time: 1.298; rss: 7019MB -> 7071MB ( +52MB) incr_comp_persist_result_cache time: 1.298; rss: 7019MB -> 7071MB ( +52MB) serialize_dep_graph time: 0.339; rss: 7071MB -> 5014MB (-2057MB) free_global_ctxt time: 214.199; rss: 5916MB -> 4364MB (-1552MB) LLVM_passes time: 0.001; rss: 1415MB -> 1352MB ( -63MB) join_worker_thread time: 161.500; rss: 5014MB -> 1352MB (-3662MB) finish_ongoing_codegen time: 0.569; rss: 931MB -> 1054MB ( +123MB) link_rlib time: 0.576; rss: 931MB -> 1054MB ( +123MB) link_binary time: 0.580; rss: 931MB -> 935MB ( +4MB) link_crate time: 162.082; rss: 5014MB -> 935MB (-4079MB) link time: 279.744; rss: 32MB -> 365MB ( +333MB) total ```
joshtriplett commented 1 month ago

Experiment: changing all pub items to pub(crate), to see how much difference that makes.

First, just after running sed:

time:   0.000; rss:   47MB ->   47MB (   +0MB)  parse_crate
time:   0.000; rss:   48MB ->   49MB (   +1MB)  setup_global_ctxt
time:   0.000; rss:   51MB ->   51MB (   +0MB)  crate_injection
time:   1.705; rss:   51MB -> 1134MB (+1083MB)  expand_crate
time:   1.705; rss:   51MB -> 1134MB (+1083MB)  macro_expand_crate
time:   0.000; rss: 1134MB -> 1134MB (   +0MB)  maybe_building_test_harness
time:   0.052; rss: 1134MB -> 1134MB (   +0MB)  AST_validation
time:   0.007; rss: 1134MB -> 1137MB (   +3MB)  finalize_imports
time:   0.036; rss: 1137MB -> 1137MB (   +0MB)  compute_effective_visibilities
time:   0.031; rss: 1137MB -> 1139MB (   +2MB)  finalize_macro_resolutions
time:   0.851; rss: 1139MB -> 1376MB ( +236MB)  late_resolve_crate
time:   0.047; rss: 1376MB -> 1407MB (  +32MB)  resolve_check_unused
time:   0.064; rss: 1407MB -> 1407MB (   +0MB)  resolve_postprocess
time:   1.038; rss: 1134MB -> 1407MB ( +274MB)  resolve_crate
time:   0.035; rss: 1410MB -> 1408MB (   -1MB)  write_dep_info
time:   0.035; rss: 1408MB -> 1409MB (   +1MB)  complete_gated_feature_checking
time:   0.120; rss: 2173MB -> 2168MB (   -4MB)  drop_ast
time:   3.109; rss: 1408MB -> 2015MB ( +606MB)  looking_for_derive_registrar
time:   3.744; rss: 1408MB -> 2055MB ( +647MB)  misc_checking_1
time:   9.457; rss: 2055MB -> 2705MB ( +650MB)  coherence_checking
time:  18.441; rss: 2055MB -> 3355MB (+1300MB)  type_check_crate
time:  14.744; rss: 3355MB -> 4531MB (+1176MB)  MIR_borrow_checking
time:   4.361; rss: 4531MB -> 4801MB ( +270MB)  MIR_effect_checking
time:   0.400; rss: 4860MB -> 4864MB (   +4MB)  module_lints
time:   0.400; rss: 4860MB -> 4864MB (   +4MB)  lint_checking
time:   0.455; rss: 4864MB -> 4864MB (   +0MB)  privacy_checking_modules
time:   0.136; rss: 4864MB -> 4864MB (   +0MB)  check_lint_expectations
time:   3.509; rss: 4806MB -> 4864MB (  +58MB)  misc_checking_3
time:   0.027; rss: 5107MB -> 5107MB (   +0MB)  monomorphization_collector_root_collections
time:   4.856; rss: 5107MB -> 5512MB ( +406MB)  monomorphization_collector_graph_walk
time:   1.792; rss: 5510MB -> 5615MB ( +105MB)  partition_and_assert_distinct_symbols
time:   8.275; rss: 4864MB -> 5445MB ( +580MB)  generate_crate_metadata
time:  19.792; rss: 5442MB -> 5939MB ( +497MB)  codegen_to_LLVM_IR
time:  19.941; rss: 5445MB -> 5939MB ( +495MB)  codegen_crate
time:   0.000; rss: 5939MB -> 5939MB (   +0MB)  incr_comp_persist_dep_graph
time:  19.035; rss: 5595MB -> 5884MB ( +289MB)  LLVM_passes
time:   0.803; rss: 5939MB -> 5934MB (   -5MB)  encode_query_results
time:   0.916; rss: 5939MB -> 5858MB (  -81MB)  incr_comp_serialize_result_cache
time:   0.916; rss: 5939MB -> 5858MB (  -81MB)  incr_comp_persist_result_cache
time:   0.916; rss: 5939MB -> 5858MB (  -81MB)  serialize_dep_graph
time:   0.292; rss: 5858MB -> 3857MB (-2001MB)  free_global_ctxt
time:   0.399; rss: 3435MB -> 3509MB (  +74MB)  link_rlib
time:   0.405; rss: 3435MB -> 3509MB (  +74MB)  link_binary
time:   0.422; rss: 3435MB -> 3098MB ( -337MB)  link_crate
time:   0.424; rss: 3857MB -> 3098MB ( -759MB)  link
time:  78.069; rss:   32MB ->  237MB ( +205MB)  total

And second, with -A dead_code -A unused_imports added:

time:   0.000; rss:   47MB ->   48MB (   +1MB)  parse_crate
time:   0.000; rss:   48MB ->   49MB (   +1MB)  setup_global_ctxt
time:   0.000; rss:   51MB ->   51MB (   +0MB)  crate_injection
time:   2.082; rss:   51MB -> 1137MB (+1086MB)  expand_crate
time:   2.082; rss:   51MB -> 1137MB (+1086MB)  macro_expand_crate
time:   0.000; rss: 1137MB -> 1137MB (   +0MB)  maybe_building_test_harness
time:   0.051; rss: 1137MB -> 1137MB (   +0MB)  AST_validation
time:   0.007; rss: 1137MB -> 1140MB (   +2MB)  finalize_imports
time:   0.034; rss: 1140MB -> 1140MB (   +0MB)  compute_effective_visibilities
time:   0.032; rss: 1140MB -> 1142MB (   +2MB)  finalize_macro_resolutions
time:   0.875; rss: 1142MB -> 1378MB ( +236MB)  late_resolve_crate
time:   0.051; rss: 1378MB -> 1410MB (  +32MB)  resolve_check_unused
time:   0.067; rss: 1410MB -> 1410MB (   +0MB)  resolve_postprocess
time:   1.066; rss: 1137MB -> 1410MB ( +273MB)  resolve_crate
time:   0.037; rss: 1412MB -> 1412MB (   +0MB)  write_dep_info
time:   0.035; rss: 1413MB -> 1413MB (   +1MB)  complete_gated_feature_checking
time:   0.107; rss: 2168MB -> 2162MB (   -6MB)  drop_ast
time:   3.023; rss: 1412MB -> 2008MB ( +596MB)  looking_for_derive_registrar
time:   3.576; rss: 1412MB -> 2048MB ( +635MB)  misc_checking_1
time:   9.612; rss: 2048MB -> 2696MB ( +649MB)  coherence_checking
time:  19.367; rss: 2048MB -> 3349MB (+1302MB)  type_check_crate
time:  15.138; rss: 3349MB -> 4527MB (+1177MB)  MIR_borrow_checking
time:   4.571; rss: 4527MB -> 4798MB ( +271MB)  MIR_effect_checking
time:   0.400; rss: 4838MB -> 4840MB (   +2MB)  module_lints
time:   0.400; rss: 4838MB -> 4840MB (   +2MB)  lint_checking
time:   0.659; rss: 4840MB -> 4840MB (   +0MB)  privacy_checking_modules
time:   0.178; rss: 4840MB -> 4840MB (   +0MB)  check_lint_expectations
time:   1.961; rss: 4802MB -> 4840MB (  +38MB)  misc_checking_3
time:   0.026; rss: 5072MB -> 5072MB (   +0MB)  monomorphization_collector_root_collections
time:   5.131; rss: 5072MB -> 5479MB ( +406MB)  monomorphization_collector_graph_walk
time:   1.794; rss: 5477MB -> 5580MB ( +104MB)  partition_and_assert_distinct_symbols
time:   8.548; rss: 4840MB -> 5412MB ( +571MB)  generate_crate_metadata
time:  21.467; rss: 5411MB -> 5944MB ( +532MB)  codegen_to_LLVM_IR
time:  21.615; rss: 5412MB -> 5944MB ( +532MB)  codegen_crate
time:  20.785; rss: 5558MB -> 5882MB ( +324MB)  LLVM_passes
time:   0.962; rss: 5944MB -> 5929MB (  -15MB)  encode_query_results
time:   1.073; rss: 5944MB -> 5854MB (  -90MB)  incr_comp_serialize_result_cache
time:   1.073; rss: 5944MB -> 5854MB (  -90MB)  incr_comp_persist_result_cache
time:   1.073; rss: 5944MB -> 5854MB (  -90MB)  serialize_dep_graph
time:   0.334; rss: 5854MB -> 3834MB (-2020MB)  free_global_ctxt
time:   0.001; rss: 3834MB -> 3834MB (   +0MB)  finish_ongoing_codegen
time:   0.434; rss: 3412MB -> 3457MB (  +45MB)  link_rlib
time:   0.440; rss: 3412MB -> 3457MB (  +45MB)  link_binary
time:   0.456; rss: 3412MB -> 3051MB ( -361MB)  link_crate
time:   0.458; rss: 3834MB -> 3051MB ( -782MB)  link
time:  80.477; rss:   32MB ->  239MB ( +207MB)  total
joshtriplett commented 1 month ago

Experiment: Used sed to mark every pub fn/pub(crate) fn/pub(super) fn and impl block as #[cfg(false)], except for impl ::std::fmt::Debug (since that would break types using derive(::std::fmt::Debug)). Had to do a bit of minor cleanup marking two other items as #[cfg(FALSE)] (a single use and a single static) to make the result compile.

Compiled with cargo +nightly rustc -- --check-cfg='cfg(FALSE)' -A unused_imports -A unused_macros -A dead_code -Ztime-passes.

time:   0.000; rss:   47MB ->   48MB (   +0MB)  parse_crate
time:   0.000; rss:   48MB ->   49MB (   +1MB)  setup_global_ctxt
time:   0.000; rss:   51MB ->   51MB (   +0MB)  crate_injection
time:   1.406; rss:   51MB ->  686MB ( +634MB)  expand_crate
time:   1.406; rss:   51MB ->  686MB ( +635MB)  macro_expand_crate
time:   0.000; rss:  686MB ->  686MB (   +0MB)  maybe_building_test_harness
time:   0.024; rss:  686MB ->  686MB (   +0MB)  AST_validation
time:   0.004; rss:  686MB ->  688MB (   +2MB)  finalize_imports
time:   0.021; rss:  688MB ->  688MB (   +0MB)  compute_effective_visibilities
time:   0.024; rss:  688MB ->  689MB (   +0MB)  finalize_macro_resolutions
time:   0.245; rss:  689MB ->  761MB (  +73MB)  late_resolve_crate
time:   0.016; rss:  761MB ->  761MB (   +0MB)  resolve_check_unused
time:   0.026; rss:  761MB ->  761MB (   +0MB)  resolve_postprocess
time:   0.337; rss:  686MB ->  761MB (  +75MB)  resolve_crate
time:   0.015; rss:  764MB ->  764MB (   +0MB)  write_dep_info
time:   0.013; rss:  764MB ->  765MB (   +0MB)  complete_gated_feature_checking
time:   0.056; rss:  933MB ->  933MB (   +0MB)  drop_ast
time:   0.763; rss:  764MB ->  885MB ( +121MB)  looking_for_derive_registrar
time:   0.944; rss:  764MB ->  916MB ( +152MB)  misc_checking_1
time:   1.230; rss:  916MB -> 1203MB ( +287MB)  coherence_checking
time:   2.498; rss:  916MB -> 1280MB ( +364MB)  type_check_crate
time:   2.172; rss: 1280MB -> 1375MB (  +95MB)  MIR_borrow_checking
time:   0.388; rss: 1375MB -> 1419MB (  +44MB)  MIR_effect_checking
time:   0.150; rss: 1441MB -> 1441MB (   +0MB)  module_lints
time:   0.150; rss: 1441MB -> 1441MB (   +0MB)  lint_checking
time:   0.095; rss: 1441MB -> 1441MB (   +0MB)  privacy_checking_modules
time:   0.034; rss: 1441MB -> 1441MB (   +0MB)  check_lint_expectations
time:   0.531; rss: 1419MB -> 1441MB (  +22MB)  misc_checking_3
time:   0.040; rss: 1543MB -> 1564MB (  +21MB)  monomorphization_collector_graph_walk
time:   0.006; rss: 1564MB -> 1564MB (   +0MB)  partition_and_assert_distinct_symbols
time:   0.732; rss: 1441MB -> 1524MB (  +83MB)  generate_crate_metadata
time:   0.208; rss: 1520MB -> 1581MB (  +61MB)  codegen_to_LLVM_IR
time:   0.227; rss: 1524MB -> 1581MB (  +57MB)  codegen_crate
time:   0.199; rss: 1531MB -> 1579MB (  +48MB)  LLVM_passes
time:   0.149; rss: 1580MB -> 1603MB (  +23MB)  encode_query_results
time:   0.183; rss: 1581MB -> 1578MB (   -2MB)  incr_comp_serialize_result_cache
time:   0.183; rss: 1581MB -> 1578MB (   -3MB)  incr_comp_persist_result_cache
time:   0.183; rss: 1581MB -> 1578MB (   -3MB)  serialize_dep_graph
time:   0.054; rss: 1578MB -> 1097MB ( -481MB)  free_global_ctxt
time:   0.072; rss:  990MB -> 1012MB (  +22MB)  link_rlib
time:   0.076; rss:  990MB -> 1012MB (  +22MB)  link_binary
time:   0.078; rss:  990MB ->  965MB (  -26MB)  link_crate
time:   0.079; rss: 1097MB ->  965MB ( -133MB)  link
time:   9.621; rss:   32MB ->  151MB ( +119MB)  total

This shows the difference for how much work we still do on otherwise dead code.

workingjubilee commented 1 month ago

given that expand_crate/macro_expand_crate stayed within a few points of a second for almost all versions, it seems they're basically "almost definitionally unavoidable" time?

saethlin commented 1 month ago

Or that they are O(tokens) with no regard for semantics.

workingjubilee commented 1 month ago

right, basically.

joshtriplett commented 1 month ago

One result of this analysis: this serves as potential input for any work we might want to do to attempt to compile only the functions in a crate's interface that get used by the crate(s) pulling in that dependency: it's not sufficient or beneficial to just treat the remaining items as non-pub, they actually have to be eliminated in order to benefit. (That also means not issuing any errors/warnings about them, but that's a reasonable tradeoff for a published dependency, much like cap-lints is.)