Open Zannick opened 1 year ago
The latest version of my code is at 975 SpotIds, and now uses more RAM than my WSL has available (so it is not possible to build with overflow-checks on anymore). I was able to get a heap trace using heaptrack:
The most interesting callee/caller entry with the fewest allocations is _RNCNvMs3_NtNtCs5MN4J8h9fJF_18rustc_mir_dataflow9framework6engineINtB7_6EngineNtNtNtBb_5impls16storage_liveness16MaybeStorageLiveE3new0Csj7pHd1IXyO8_16rustc_const_eval in librustc_driver-5a1863ba703f721e.so
with 1703375 allocations for 56.7 GiB, which is an average of about 35 KiB.
Post-expansion, this crate has a few functions which look like this:
Self::Irikar__Sight_Room
} else if value
== ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((0usize
+ 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1)
+ 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1)
+ 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1)
+ 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1)
+ 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1)
+ 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1)
+ 1) + 1) + 1) + 1) + 1) + 1)
{
Self::Irikar_Breach__Save_Room
} else if value
== (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((0usize
+ 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1)
+ 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1)
+ 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1)
+ 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1)
+ 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1)
+ 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1)
+ 1) + 1) + 1) + 1) + 1) + 1) + 1)
{
Self::Menu__Upgrade_Menu
} else {
::enum_map::out_of_bounds()
}
There are N different arms before this, and a handful of functions like this that end up with a few tens of thousands of MIR locals. It looks like this code was written assuming constant propagation exists, but currently in rustc, there is no constant propagation in debug builds. Every one of these +
has a runtime check for overflow inserted, and they are not optimized out.
This explodes memory usage because MIR validation has memory usage which is O(blocks * locals). For this program, every one of these +
is lowered to a few new locals plus a new block so the complexity is more like O(n^2) but since the number of +
is quadratic with the size of these enums like SpotId
, the memory usage probably looks cubic or even worse, if the number of variants in SpotId
is quadratic with map size or something like that.
This flavor of problem is not unique to rustc, basically every compiler has quadratic or worse resource usage but with a small enough constant factor that normal programs don't notice it. In this case, the memory in those huge spikes is used for bitsets; for most functions all the memory for the bitsets is not very significant compared to the memory used for the MIR that the bitsets are being used to analyze. But if a function has thousands of +
in it, things go a bit haywire.
On my machine with overflow-checks enabled, adding RUSTFLAGS=-Zmir-enable-passes=+ConstProp
drops clean debug build time from 4m32s to 1m55s and peak memory from 34 GB to 10 GB (enabling GVN has the same effect). It's still 10 GB because ConstProp on its own can't fully clean up all the overflow checks, and if you enable too many MIR opts, build time re-regresses. Didn't look into why.
@rust-lang/wg-compiler-performance What do you think about adding something like this to the benchmark suite? Or should this already be covered by one of the stress-tests?
That's very interesting, thank you for the analysis. It looks like that comes from the enum-map-derive crate, which explains the correlation with my enum sizes. Someone's already opened a bug about that code generation, so I can follow up there to recommend improving that.
I have some ability to control constant propagation in my own generated code but I don't think anything's going to reach that level.
enum-map 2.7.2 was released which generates somewhat nicer code.
I tried this code:
https://github.com/Zannick/logic-graph/tree/24989d291cd65aa2ba0c22d717ca179a39aeeced/games/AxiomVerge2
Apologies, this is generated code implementing traits for a library in the same repository, and since I don't know what the issue is, I cannot quickly provide a smaller test case. I suspect something relating to traits similar to #117257, #117275, and #116914 but this is complicated by my observation that adding variants to
enum SpotId
(an associated type used in the trait impls) ingraph.rs
increases rustc's peak memory usage. Since I'm expecting to roughly quintuple the size of that enum before the project is finished, I'm concerned about being able to track down what is going on (although compilation is quite smooth with overflow-checks disabled).I expected to see this happen: compilation of the package itself is fast and does not use more memory than running the program will...
Instead, this happened: compilation slows down from 40s to 3m40s and spikes to 33 GiB. With
-Z time passes
, this section appears after the spikes:Watching usage by refreshing top: As-is, 836 SpotIds: Spikes (GiB) 32.6, 32.4, 17.7, 17.7 With 820 SpotIds: Spikes (GiB): 30.7, 30.2, 17.7, 17.6
With
-Z time passes
, this section appears after the spikes:The earlier stage "MIR borrow checking" uses around 9-10GiB and also takes longer when overflow-checks is enabled.
I've also made a thread on the users forum where advice on debugging or reducing this overhead are appreciated: https://users.rust-lang.org/t/how-to-debug-rustc-memory-usage-spikes/100750
Meta
rustc --version --verbose
:There are some recent rust features used in dependencies so I can't compile to check back more than a handful of stable versions.
overflow-checks = true
``` ~/logic-graph/games/AxiomVerge2$ /usr/bin/time -v cargo +nightly rustc -p axiom_verge2 --lib -- -Z time-passes Compiling axiom_verge2 v0.1.0 (/home/bswolf/logic-graph/games/AxiomVerge2) time: 0.000; rss: 40MB -> 44MB ( +4MB) incr_comp_prepare_session_directory time: 0.854; rss: 46MB -> 497MB ( +451MB) expand_crate time: 0.854; rss: 46MB -> 497MB ( +451MB) macro_expand_crate time: 0.034; rss: 497MB -> 497MB ( +0MB) AST_validation time: 0.002; rss: 497MB -> 498MB ( +1MB) finalize_macro_resolutions time: 0.147; rss: 498MB -> 527MB ( +29MB) late_resolve_crate time: 0.034; rss: 527MB -> 527MB ( +0MB) resolve_check_unused time: 0.066; rss: 527MB -> 527MB ( +0MB) resolve_postprocess time: 0.250; rss: 497MB -> 527MB ( +30MB) resolve_crate time: 0.033; rss: 527MB -> 527MB ( +0MB) prepare_outputs time: 0.036; rss: 527MB -> 527MB ( +0MB) complete_gated_feature_checking time: 0.096; rss: 871MB -> 691MB ( -180MB) drop_ast time: 1.039; rss: 527MB -> 672MB ( +145MB) looking_for_derive_registrar time: 1.251; rss: 527MB -> 672MB ( +145MB) misc_checking_1 time: 0.073; rss: 672MB -> 680MB ( +7MB) type_collecting time: 0.038; rss: 680MB -> 687MB ( +7MB) coherence_checking time: 0.061; rss: 687MB -> 690MB ( +4MB) wf_checking time: 0.011; rss: 690MB -> 695MB ( +4MB) item_types_checking time: 16.638; rss: 672MB -> 1462MB ( +790MB) type_check_crate time: 47.011; rss: 1462MB -> 3880MB (+2418MB) MIR_borrow_checking time: 2.974; rss: 3880MB -> 3492MB ( -388MB) MIR_effect_checking time: 0.353; rss: 3492MB -> 3492MB ( +0MB) module_lints time: 0.353; rss: 3492MB -> 3492MB ( +0MB) lint_checking time: 0.262; rss: 3492MB -> 3492MB ( +0MB) privacy_checking_modules time: 0.668; rss: 3492MB -> 3492MB ( +0MB) misc_checking_3 time: 0.351; rss: 2858MB -> 2878MB ( +20MB) monomorphization_collector_graph_walk time: 0.021; rss: 2878MB -> 2878MB ( +0MB) partition_and_assert_distinct_symbols time: 78.799; rss: 3492MB -> 2878MB ( -614MB) generate_crate_metadata time: 5.878; rss: 2885MB -> 4563MB (+1678MB) codegen_to_LLVM_IR time: 5.891; rss: 2878MB -> 4563MB (+1685MB) codegen_crate time: 0.000; rss: 4563MB -> 4563MB ( +0MB) assert_dep_graph time: 0.000; rss: 4563MB -> 4563MB ( +0MB) incr_comp_persist_dep_graph time: 0.561; rss: 4563MB -> 4524MB ( -39MB) encode_query_results time: 0.569; rss: 4563MB -> 4524MB ( -39MB) incr_comp_serialize_result_cache time: 0.569; rss: 4563MB -> 4524MB ( -39MB) incr_comp_persist_result_cache time: 0.569; rss: 4563MB -> 4524MB ( -39MB) serialize_dep_graph time: 0.204; rss: 4524MB -> 3201MB (-1324MB) free_global_ctxt time: 68.533; rss: 4451MB -> 3199MB (-1252MB) LLVM_passes time: 0.000; rss: 3199MB -> 3196MB ( -3MB) join_worker_thread time: 67.330; rss: 3201MB -> 3196MB ( -4MB) finish_ongoing_codegen time: 0.363; rss: 3183MB -> 946MB (-2238MB) link_rlib time: 0.379; rss: 3183MB -> 946MB (-2238MB) link_binary time: 0.385; rss: 3183MB -> 781MB (-2402MB) link_crate time: 67.715; rss: 3201MB -> 781MB (-2420MB) link time: 222.921; rss: 29MB -> 367MB ( +338MB) total Finished dev [unoptimized + debuginfo] target(s) in 3m 42s warning: the following packages contain code that will be rejected by a future version of Rust: svg v0.7.2 note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 1` Command being timed: "cargo +nightly rustc -p axiom_verge2 --lib -- -Z time-passes" User time (seconds): 155.32 System time (seconds): 69.97 Percent of CPU this job got: 101% Elapsed (wall clock) time (h:mm:ss or m:ss): 3:43.00 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 34584552 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 43728345 Voluntary context switches: 1154 Involuntary context switches: 277 Swaps: 0 File system inputs: 0 File system outputs: 2448208 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ```
overflow-checks = false
``` ~/logic-graph/games/AxiomVerge2$ /usr/bin/time -v cargo +nightly rustc -p axiom_verge2 --lib -- -Z time-passes Compiling axiom_verge2 v0.1.0 (/home/bswolf/logic-graph/games/AxiomVerge2) time: 0.846; rss: 46MB -> 502MB ( +456MB) expand_crate time: 0.846; rss: 46MB -> 502MB ( +456MB) macro_expand_crate time: 0.050; rss: 502MB -> 502MB ( +0MB) AST_validation time: 0.145; rss: 502MB -> 532MB ( +30MB) late_resolve_crate time: 0.034; rss: 532MB -> 532MB ( +0MB) resolve_check_unused time: 0.066; rss: 532MB -> 532MB ( +0MB) resolve_postprocess time: 0.248; rss: 502MB -> 532MB ( +30MB) resolve_crate time: 0.033; rss: 532MB -> 532MB ( +0MB) prepare_outputs time: 0.036; rss: 532MB -> 532MB ( +0MB) complete_gated_feature_checking time: 0.097; rss: 874MB -> 694MB ( -180MB) drop_ast time: 1.013; rss: 532MB -> 675MB ( +142MB) looking_for_derive_registrar time: 1.213; rss: 532MB -> 675MB ( +142MB) misc_checking_1 time: 0.071; rss: 675MB -> 682MB ( +8MB) type_collecting time: 0.039; rss: 682MB -> 690MB ( +7MB) coherence_checking time: 0.063; rss: 690MB -> 697MB ( +7MB) wf_checking time: 0.011; rss: 697MB -> 697MB ( +0MB) item_types_checking time: 16.490; rss: 675MB -> 1468MB ( +794MB) type_check_crate time: 14.551; rss: 1468MB -> 2096MB ( +627MB) MIR_borrow_checking time: 1.111; rss: 2096MB -> 2119MB ( +24MB) MIR_effect_checking time: 0.360; rss: 2119MB -> 2112MB ( -8MB) module_lints time: 0.360; rss: 2119MB -> 2112MB ( -8MB) lint_checking time: 0.262; rss: 2112MB -> 2112MB ( +0MB) privacy_checking_modules time: 0.676; rss: 2119MB -> 2112MB ( -8MB) misc_checking_3 time: 0.242; rss: 2128MB -> 2144MB ( +16MB) monomorphization_collector_graph_walk time: 0.013; rss: 2144MB -> 2144MB ( +0MB) partition_and_assert_distinct_symbols time: 1.301; rss: 2112MB -> 2144MB ( +32MB) generate_crate_metadata time: 0.834; rss: 2148MB -> 2284MB ( +136MB) codegen_to_LLVM_IR time: 0.848; rss: 2144MB -> 2284MB ( +140MB) codegen_crate time: 0.000; rss: 2284MB -> 2284MB ( +0MB) incr_comp_persist_dep_graph time: 0.265; rss: 2284MB -> 2273MB ( -12MB) encode_query_results time: 0.272; rss: 2284MB -> 2273MB ( -12MB) incr_comp_serialize_result_cache time: 0.272; rss: 2284MB -> 2273MB ( -12MB) incr_comp_persist_result_cache time: 0.272; rss: 2284MB -> 2273MB ( -11MB) serialize_dep_graph time: 0.091; rss: 2273MB -> 1228MB (-1045MB) free_global_ctxt time: 0.763; rss: 2168MB -> 1190MB ( -977MB) LLVM_passes time: 0.001; rss: 1158MB -> 1147MB ( -11MB) join_worker_thread time: 0.001; rss: 1147MB -> 1103MB ( -45MB) copy_all_cgu_workproducts_to_incr_comp_cache_dir time: 0.045; rss: 1228MB -> 1103MB ( -125MB) finish_ongoing_codegen time: 0.000; rss: 1079MB -> 1079MB ( +0MB) incr_comp_finalize_session_directory time: 0.000; rss: 1078MB -> 1073MB ( -4MB) link_binary_check_files_are_writeable time: 0.107; rss: 1062MB -> 1101MB ( +39MB) link_rlib time: 0.114; rss: 1078MB -> 1101MB ( +24MB) link_binary time: 0.117; rss: 1078MB -> 982MB ( -95MB) link_crate time: 0.163; rss: 1228MB -> 982MB ( -245MB) link time: 37.925; rss: 29MB -> 146MB ( +116MB) total Finished dev [unoptimized + debuginfo] target(s) in 37.98s warning: the following packages contain code that will be rejected by a future version of Rust: svg v0.7.2 note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 1` Command being timed: "cargo +nightly rustc -p axiom_verge2 --lib -- -Z time-passes" User time (seconds): 38.38 System time (seconds): 1.33 Percent of CPU this job got: 104% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:37.98 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2245232 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 375156 Voluntary context switches: 908 Involuntary context switches: 56 Swaps: 0 File system inputs: 0 File system outputs: 818960 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ```