Open shepmaster opened 1 year ago
I profiled this a bit and a lot of the time is spent in MIR data analysis: MaybeUninitializedPlaces
, EverInitializedPlaces
, MaybeInitializePlaces
, iterate_to_fixpoint
, etc. Plus lots of ChunkedBitSet
manipulations within these.
But the real problem is a quadratic amount of unwinding code generated. I looked at the MIR generated with 100 lines in the vector. One key part is this:
let mut _9: std::string::String;
let _10: &str;
let mut _11: std::string::String;
let _12: &str;
...
let mut _205: std::string::String;
let _206: &str;
let mut _207: std::string::String;
let _208: &str;
which makes sense. There are also vast numbers of blocks like this:
bb104 (cleanup): {
drop(_205) -> [return: bb105, unwind terminate(cleanup)];
}
bb105 (cleanup): {
drop(_203) -> [return: bb106, unwind terminate(cleanup)];
}
bb106 (cleanup): {
drop(_201) -> [return: bb107, unwind terminate(cleanup)];
}
In fact, a quadratic number of blocks: 99 for _9
, 98 for _11
, ..., 1 for _205
, 0 for _207
.
Because they are in chains:
_205
, _203
, ..., _11
, _9
._203
, ..., _11
, _9
._11
, _9
._9
.It seems possible for these separate chains to be merged into a single chain that drops _205
, _203
, ..., _11
, _9
, and then the other chains are handled by just jumping into different blocks as necessary. How hard that is to implement, I have no idea.
Because they are in chains
Interesting!
It looks like if you extract a function [^1] and call that function in the vec!
macro, then there's only one chain created with the same properties you want. The single drop chain also appears if you make it into a non-async function.
This makes it seem like there's already some piece of code that does the drop chain optimization, but perhaps it's not advanced enough to handle whatever the async transform does to the code.
[^1]: fn make() -> String { "monster".to_string() }
I just grepped for "drop.*chain" in the source code and found this, which seems awfully relevant: scope.rs
The 1.69 to 1.70 30% bump bisects to #109458.
(Also, possibly interesting for its future stabilization path, -Zdrop-tracking-mir
makes things much worse. This is admittedly a stress test, but at N=1250 it's like 22% on a debug build and 85% on a check build)
I profiled this a bit and a lot of the time is spent in MIR data analysis:
MaybeUninitializedPlaces
,EverInitializedPlaces
,MaybeInitializePlaces
,iterate_to_fixpoint
, etc. Plus lots ofChunkedBitSet
manipulations within these.
This part sounds specific to async; the generator capture analysis runs these.
But the real problem is a quadratic amount of unwinding code generated.
This should not be specific to async and points to general improvements we can make in MIR building.
This should not be specific to async and points to general improvements we can make in MIR building.
But @shepmaster said that removing the async
keyword greatly reduced compile time.
Build with
cargo build
. I'm using Rust 1.72 on aarch64-apple-darwin.In addition to the slow compilation time, I also see large memory usage. With 2500 lines, my OS reported
rustc
as consuming 20GB.Exploration
Different versions
There was no obvious correlation.
Different numbers of lines
This appears to show time is proportional to (number of lines)^2
String slices
Switching to returning a
&'static str
instead ofString
resulted in fast compilation (<250 ms).Using a function
Creating a helper function and calling it inside of
vec![]
resulted in fast compilation (<250 ms).Using another method
Switching to returning a
usize
instead of aString
has reduced compilation time but still surprisingly slow (2.5 sec).Avoiding
async fn
Removing
async
frommake_menagerie
has reduced compilation time (<750 ms)Test case generation
For trying different things, I used two files
main.rs
And to generate 2500 lines: