Open joshtriplett opened 1 month ago
Experiment: changing all pub
items to pub(crate)
, to see how much difference that makes.
First, just after running sed
:
time: 0.000; rss: 47MB -> 47MB ( +0MB) parse_crate
time: 0.000; rss: 48MB -> 49MB ( +1MB) setup_global_ctxt
time: 0.000; rss: 51MB -> 51MB ( +0MB) crate_injection
time: 1.705; rss: 51MB -> 1134MB (+1083MB) expand_crate
time: 1.705; rss: 51MB -> 1134MB (+1083MB) macro_expand_crate
time: 0.000; rss: 1134MB -> 1134MB ( +0MB) maybe_building_test_harness
time: 0.052; rss: 1134MB -> 1134MB ( +0MB) AST_validation
time: 0.007; rss: 1134MB -> 1137MB ( +3MB) finalize_imports
time: 0.036; rss: 1137MB -> 1137MB ( +0MB) compute_effective_visibilities
time: 0.031; rss: 1137MB -> 1139MB ( +2MB) finalize_macro_resolutions
time: 0.851; rss: 1139MB -> 1376MB ( +236MB) late_resolve_crate
time: 0.047; rss: 1376MB -> 1407MB ( +32MB) resolve_check_unused
time: 0.064; rss: 1407MB -> 1407MB ( +0MB) resolve_postprocess
time: 1.038; rss: 1134MB -> 1407MB ( +274MB) resolve_crate
time: 0.035; rss: 1410MB -> 1408MB ( -1MB) write_dep_info
time: 0.035; rss: 1408MB -> 1409MB ( +1MB) complete_gated_feature_checking
time: 0.120; rss: 2173MB -> 2168MB ( -4MB) drop_ast
time: 3.109; rss: 1408MB -> 2015MB ( +606MB) looking_for_derive_registrar
time: 3.744; rss: 1408MB -> 2055MB ( +647MB) misc_checking_1
time: 9.457; rss: 2055MB -> 2705MB ( +650MB) coherence_checking
time: 18.441; rss: 2055MB -> 3355MB (+1300MB) type_check_crate
time: 14.744; rss: 3355MB -> 4531MB (+1176MB) MIR_borrow_checking
time: 4.361; rss: 4531MB -> 4801MB ( +270MB) MIR_effect_checking
time: 0.400; rss: 4860MB -> 4864MB ( +4MB) module_lints
time: 0.400; rss: 4860MB -> 4864MB ( +4MB) lint_checking
time: 0.455; rss: 4864MB -> 4864MB ( +0MB) privacy_checking_modules
time: 0.136; rss: 4864MB -> 4864MB ( +0MB) check_lint_expectations
time: 3.509; rss: 4806MB -> 4864MB ( +58MB) misc_checking_3
time: 0.027; rss: 5107MB -> 5107MB ( +0MB) monomorphization_collector_root_collections
time: 4.856; rss: 5107MB -> 5512MB ( +406MB) monomorphization_collector_graph_walk
time: 1.792; rss: 5510MB -> 5615MB ( +105MB) partition_and_assert_distinct_symbols
time: 8.275; rss: 4864MB -> 5445MB ( +580MB) generate_crate_metadata
time: 19.792; rss: 5442MB -> 5939MB ( +497MB) codegen_to_LLVM_IR
time: 19.941; rss: 5445MB -> 5939MB ( +495MB) codegen_crate
time: 0.000; rss: 5939MB -> 5939MB ( +0MB) incr_comp_persist_dep_graph
time: 19.035; rss: 5595MB -> 5884MB ( +289MB) LLVM_passes
time: 0.803; rss: 5939MB -> 5934MB ( -5MB) encode_query_results
time: 0.916; rss: 5939MB -> 5858MB ( -81MB) incr_comp_serialize_result_cache
time: 0.916; rss: 5939MB -> 5858MB ( -81MB) incr_comp_persist_result_cache
time: 0.916; rss: 5939MB -> 5858MB ( -81MB) serialize_dep_graph
time: 0.292; rss: 5858MB -> 3857MB (-2001MB) free_global_ctxt
time: 0.399; rss: 3435MB -> 3509MB ( +74MB) link_rlib
time: 0.405; rss: 3435MB -> 3509MB ( +74MB) link_binary
time: 0.422; rss: 3435MB -> 3098MB ( -337MB) link_crate
time: 0.424; rss: 3857MB -> 3098MB ( -759MB) link
time: 78.069; rss: 32MB -> 237MB ( +205MB) total
And second, with -A dead_code -A unused_imports
added:
time: 0.000; rss: 47MB -> 48MB ( +1MB) parse_crate
time: 0.000; rss: 48MB -> 49MB ( +1MB) setup_global_ctxt
time: 0.000; rss: 51MB -> 51MB ( +0MB) crate_injection
time: 2.082; rss: 51MB -> 1137MB (+1086MB) expand_crate
time: 2.082; rss: 51MB -> 1137MB (+1086MB) macro_expand_crate
time: 0.000; rss: 1137MB -> 1137MB ( +0MB) maybe_building_test_harness
time: 0.051; rss: 1137MB -> 1137MB ( +0MB) AST_validation
time: 0.007; rss: 1137MB -> 1140MB ( +2MB) finalize_imports
time: 0.034; rss: 1140MB -> 1140MB ( +0MB) compute_effective_visibilities
time: 0.032; rss: 1140MB -> 1142MB ( +2MB) finalize_macro_resolutions
time: 0.875; rss: 1142MB -> 1378MB ( +236MB) late_resolve_crate
time: 0.051; rss: 1378MB -> 1410MB ( +32MB) resolve_check_unused
time: 0.067; rss: 1410MB -> 1410MB ( +0MB) resolve_postprocess
time: 1.066; rss: 1137MB -> 1410MB ( +273MB) resolve_crate
time: 0.037; rss: 1412MB -> 1412MB ( +0MB) write_dep_info
time: 0.035; rss: 1413MB -> 1413MB ( +1MB) complete_gated_feature_checking
time: 0.107; rss: 2168MB -> 2162MB ( -6MB) drop_ast
time: 3.023; rss: 1412MB -> 2008MB ( +596MB) looking_for_derive_registrar
time: 3.576; rss: 1412MB -> 2048MB ( +635MB) misc_checking_1
time: 9.612; rss: 2048MB -> 2696MB ( +649MB) coherence_checking
time: 19.367; rss: 2048MB -> 3349MB (+1302MB) type_check_crate
time: 15.138; rss: 3349MB -> 4527MB (+1177MB) MIR_borrow_checking
time: 4.571; rss: 4527MB -> 4798MB ( +271MB) MIR_effect_checking
time: 0.400; rss: 4838MB -> 4840MB ( +2MB) module_lints
time: 0.400; rss: 4838MB -> 4840MB ( +2MB) lint_checking
time: 0.659; rss: 4840MB -> 4840MB ( +0MB) privacy_checking_modules
time: 0.178; rss: 4840MB -> 4840MB ( +0MB) check_lint_expectations
time: 1.961; rss: 4802MB -> 4840MB ( +38MB) misc_checking_3
time: 0.026; rss: 5072MB -> 5072MB ( +0MB) monomorphization_collector_root_collections
time: 5.131; rss: 5072MB -> 5479MB ( +406MB) monomorphization_collector_graph_walk
time: 1.794; rss: 5477MB -> 5580MB ( +104MB) partition_and_assert_distinct_symbols
time: 8.548; rss: 4840MB -> 5412MB ( +571MB) generate_crate_metadata
time: 21.467; rss: 5411MB -> 5944MB ( +532MB) codegen_to_LLVM_IR
time: 21.615; rss: 5412MB -> 5944MB ( +532MB) codegen_crate
time: 20.785; rss: 5558MB -> 5882MB ( +324MB) LLVM_passes
time: 0.962; rss: 5944MB -> 5929MB ( -15MB) encode_query_results
time: 1.073; rss: 5944MB -> 5854MB ( -90MB) incr_comp_serialize_result_cache
time: 1.073; rss: 5944MB -> 5854MB ( -90MB) incr_comp_persist_result_cache
time: 1.073; rss: 5944MB -> 5854MB ( -90MB) serialize_dep_graph
time: 0.334; rss: 5854MB -> 3834MB (-2020MB) free_global_ctxt
time: 0.001; rss: 3834MB -> 3834MB ( +0MB) finish_ongoing_codegen
time: 0.434; rss: 3412MB -> 3457MB ( +45MB) link_rlib
time: 0.440; rss: 3412MB -> 3457MB ( +45MB) link_binary
time: 0.456; rss: 3412MB -> 3051MB ( -361MB) link_crate
time: 0.458; rss: 3834MB -> 3051MB ( -782MB) link
time: 80.477; rss: 32MB -> 239MB ( +207MB) total
Experiment: Used sed
to mark every pub fn
/pub(crate) fn
/pub(super) fn
and impl
block as #[cfg(false)]
, except for impl ::std::fmt::Debug
(since that would break types using derive(::std::fmt::Debug)
). Had to do a bit of minor cleanup marking two other items as #[cfg(FALSE)]
(a single use
and a single static
) to make the result compile.
Compiled with cargo +nightly rustc -- --check-cfg='cfg(FALSE)' -A unused_imports -A unused_macros -A dead_code -Ztime-passes
.
time: 0.000; rss: 47MB -> 48MB ( +0MB) parse_crate
time: 0.000; rss: 48MB -> 49MB ( +1MB) setup_global_ctxt
time: 0.000; rss: 51MB -> 51MB ( +0MB) crate_injection
time: 1.406; rss: 51MB -> 686MB ( +634MB) expand_crate
time: 1.406; rss: 51MB -> 686MB ( +635MB) macro_expand_crate
time: 0.000; rss: 686MB -> 686MB ( +0MB) maybe_building_test_harness
time: 0.024; rss: 686MB -> 686MB ( +0MB) AST_validation
time: 0.004; rss: 686MB -> 688MB ( +2MB) finalize_imports
time: 0.021; rss: 688MB -> 688MB ( +0MB) compute_effective_visibilities
time: 0.024; rss: 688MB -> 689MB ( +0MB) finalize_macro_resolutions
time: 0.245; rss: 689MB -> 761MB ( +73MB) late_resolve_crate
time: 0.016; rss: 761MB -> 761MB ( +0MB) resolve_check_unused
time: 0.026; rss: 761MB -> 761MB ( +0MB) resolve_postprocess
time: 0.337; rss: 686MB -> 761MB ( +75MB) resolve_crate
time: 0.015; rss: 764MB -> 764MB ( +0MB) write_dep_info
time: 0.013; rss: 764MB -> 765MB ( +0MB) complete_gated_feature_checking
time: 0.056; rss: 933MB -> 933MB ( +0MB) drop_ast
time: 0.763; rss: 764MB -> 885MB ( +121MB) looking_for_derive_registrar
time: 0.944; rss: 764MB -> 916MB ( +152MB) misc_checking_1
time: 1.230; rss: 916MB -> 1203MB ( +287MB) coherence_checking
time: 2.498; rss: 916MB -> 1280MB ( +364MB) type_check_crate
time: 2.172; rss: 1280MB -> 1375MB ( +95MB) MIR_borrow_checking
time: 0.388; rss: 1375MB -> 1419MB ( +44MB) MIR_effect_checking
time: 0.150; rss: 1441MB -> 1441MB ( +0MB) module_lints
time: 0.150; rss: 1441MB -> 1441MB ( +0MB) lint_checking
time: 0.095; rss: 1441MB -> 1441MB ( +0MB) privacy_checking_modules
time: 0.034; rss: 1441MB -> 1441MB ( +0MB) check_lint_expectations
time: 0.531; rss: 1419MB -> 1441MB ( +22MB) misc_checking_3
time: 0.040; rss: 1543MB -> 1564MB ( +21MB) monomorphization_collector_graph_walk
time: 0.006; rss: 1564MB -> 1564MB ( +0MB) partition_and_assert_distinct_symbols
time: 0.732; rss: 1441MB -> 1524MB ( +83MB) generate_crate_metadata
time: 0.208; rss: 1520MB -> 1581MB ( +61MB) codegen_to_LLVM_IR
time: 0.227; rss: 1524MB -> 1581MB ( +57MB) codegen_crate
time: 0.199; rss: 1531MB -> 1579MB ( +48MB) LLVM_passes
time: 0.149; rss: 1580MB -> 1603MB ( +23MB) encode_query_results
time: 0.183; rss: 1581MB -> 1578MB ( -2MB) incr_comp_serialize_result_cache
time: 0.183; rss: 1581MB -> 1578MB ( -3MB) incr_comp_persist_result_cache
time: 0.183; rss: 1581MB -> 1578MB ( -3MB) serialize_dep_graph
time: 0.054; rss: 1578MB -> 1097MB ( -481MB) free_global_ctxt
time: 0.072; rss: 990MB -> 1012MB ( +22MB) link_rlib
time: 0.076; rss: 990MB -> 1012MB ( +22MB) link_binary
time: 0.078; rss: 990MB -> 965MB ( -26MB) link_crate
time: 0.079; rss: 1097MB -> 965MB ( -133MB) link
time: 9.621; rss: 32MB -> 151MB ( +119MB) total
This shows the difference for how much work we still do on otherwise dead code.
given that expand_crate
/macro_expand_crate
stayed within a few points of a second for almost all versions, it seems they're basically "almost definitionally unavoidable" time?
Or that they are O(tokens) with no regard for semantics.
right, basically.
One result of this analysis: this serves as potential input for any work we might want to do to attempt to compile only the functions in a crate's interface that get used by the crate(s) pulling in that dependency: it's not sufficient or beneficial to just treat the remaining items as non-pub
, they actually have to be eliminated in order to benefit. (That also means not issuing any errors/warnings about them, but that's a reasonable tradeoff for a published dependency, much like cap-lints is.)
I'm doing some performance analysis on the (long) compilation of the crate
aws-sdk-ec2
, and using this issue to take notes about it.rustc +nightly --version
:rustc 1.81.0-nightly (0c81f94b9 2024-07-10)
cargo +nightly rustc -- -Ztime-passes
:Optimized version, for reference
`cargo +nightly rustc -- -C opt-level=3 -Ztime-passes`: ``` time: 0.000; rss: 47MB -> 47MB ( +0MB) parse_crate time: 0.000; rss: 48MB -> 49MB ( +1MB) setup_global_ctxt time: 0.000; rss: 51MB -> 51MB ( +0MB) crate_injection time: 1.788; rss: 51MB -> 1104MB (+1052MB) expand_crate time: 1.788; rss: 51MB -> 1104MB (+1053MB) macro_expand_crate time: 0.000; rss: 1104MB -> 1104MB ( +0MB) maybe_building_test_harness time: 0.048; rss: 1104MB -> 1104MB ( +0MB) AST_validation time: 0.006; rss: 1104MB -> 1106MB ( +2MB) finalize_imports time: 0.027; rss: 1106MB -> 1106MB ( +0MB) compute_effective_visibilities time: 0.034; rss: 1106MB -> 1108MB ( +2MB) finalize_macro_resolutions time: 0.975; rss: 1108MB -> 1366MB ( +258MB) late_resolve_crate time: 0.041; rss: 1366MB -> 1398MB ( +32MB) resolve_check_unused time: 0.059; rss: 1398MB -> 1398MB ( +0MB) resolve_postprocess time: 1.142; rss: 1104MB -> 1398MB ( +294MB) resolve_crate time: 0.032; rss: 1400MB -> 1399MB ( -1MB) write_dep_info time: 0.030; rss: 1399MB -> 1399MB ( +1MB) complete_gated_feature_checking time: 0.103; rss: 2153MB -> 2148MB ( -6MB) drop_ast time: 3.136; rss: 1399MB -> 1994MB ( +595MB) looking_for_derive_registrar time: 3.674; rss: 1399MB -> 2036MB ( +638MB) misc_checking_1 time: 9.398; rss: 2036MB -> 2696MB ( +660MB) coherence_checking time: 18.398; rss: 2036MB -> 3354MB (+1318MB) type_check_crate time: 14.451; rss: 3354MB -> 4534MB (+1180MB) MIR_borrow_checking time: 4.738; rss: 4534MB -> 4803MB ( +269MB) MIR_effect_checking time: 0.421; rss: 4835MB -> 4840MB ( +4MB) module_lints time: 0.421; rss: 4835MB -> 4840MB ( +4MB) lint_checking time: 0.454; rss: 4840MB -> 4840MB ( +0MB) privacy_checking_modules time: 0.136; rss: 4840MB -> 4844MB ( +4MB) check_lint_expectations time: 1.728; rss: 4807MB -> 4844MB ( +37MB) misc_checking_3 time: 2.511; rss: 4844MB -> 5017MB ( +173MB) generate_crate_metadata time: 0.049; rss: 5017MB -> 5017MB ( +0MB) monomorphization_collector_root_collections time: 8.685; rss: 5017MB -> 5424MB ( +406MB) monomorphization_collector_graph_walk time: 4.531; rss: 5417MB -> 5539MB ( +122MB) partition_and_assert_distinct_symbols time: 53.323; rss: 5467MB -> 7019MB (+1553MB) codegen_to_LLVM_IR time: 66.963; rss: 5017MB -> 7019MB (+2002MB) codegen_crate time: 1.174; rss: 7019MB -> 7138MB ( +119MB) encode_query_results time: 1.298; rss: 7019MB -> 7071MB ( +52MB) incr_comp_serialize_result_cache time: 1.298; rss: 7019MB -> 7071MB ( +52MB) incr_comp_persist_result_cache time: 1.298; rss: 7019MB -> 7071MB ( +52MB) serialize_dep_graph time: 0.339; rss: 7071MB -> 5014MB (-2057MB) free_global_ctxt time: 214.199; rss: 5916MB -> 4364MB (-1552MB) LLVM_passes time: 0.001; rss: 1415MB -> 1352MB ( -63MB) join_worker_thread time: 161.500; rss: 5014MB -> 1352MB (-3662MB) finish_ongoing_codegen time: 0.569; rss: 931MB -> 1054MB ( +123MB) link_rlib time: 0.576; rss: 931MB -> 1054MB ( +123MB) link_binary time: 0.580; rss: 931MB -> 935MB ( +4MB) link_crate time: 162.082; rss: 5014MB -> 935MB (-4079MB) link time: 279.744; rss: 32MB -> 365MB ( +333MB) total ```