Open hniksic opened 2 years ago
In fairness to rustc, that array is 14.9 GB. If you only have 8 GB of memory, you'll never be able to run this program even if you do manage to compile it.
And if you do have enough memory to get through whatever's going on here, it looks to me like the incremental compilation system isn't prepared to handle this:
thread 'rustc' panicked at 'assertion failed: pos <= u32::MAX as usize', compiler/rustc_query_impl/src/on_disk_cache.rs:119:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: internal compiler error: unexpected panic
note: the compiler unexpectedly panicked. this is a bug.
note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md
note: rustc 1.60.0-dev running on x86_64-unknown-linux-gnu
note: compiler flags: -C embed-bitcode=no -C debuginfo=2 -C incremental --crate-type bin
note: some of the compiler flags provided by cargo are hidden
query stack during panic:
end of query stack
(this is a local build because I want debug symbols)
All that said, I feel like this shouldn't be so memory-intensive. It looks like the memory consumption appears to be from the MIR interpreter's InitMask
... I won't claim this because I'm truly lost in this area of the code but I'm looking.
It looks like the memory consumption appears to be from the MIR interpreter's
InitMask
...
So I believe what you have here is a very large static
that's mostly uninitialized (except for presumably the Once
in it?).
I forget the current implementation of InitMask
, but if it's one bit per byte, the 14.9GiB static
will require a 1.86GiB InitMask
.
We could maybe reduce the cost of some of that tracking, but it's still not great overall.
Your executable would likely also need to be 14.9GiB in size, because the static
isn't fully uninitialized, so AFAIK it can't go into .bss
and instead has to be fully present in .data
(unless LLVM gets clever?).
What you may want to do instead is Box
your very large 2D array.
That results in successful compilation and this runtime message (on the playground):
memory allocation of 16000000000 bytes failed
Ah, the allocations are all happening along a code path that calls copy_op
. So I suspect there's somehow a copy happening in the MIR of this array? And... a lot of copies??? (I clearly don't understand much of MIR). The Size
being passed to InitMask::grow
when they're created makes sense.
Peak memory of the build is 44.8 GB, btw.
Yea, there are a few extra copies of values happening that we could probably avoid, but we always need at least two at some point during CTFE.
It may help to implement a "deinit source on move" scheme. Basically CTFE treats Operand::Move
and Operand::Copy
the same.
So I suspect there's somehow a copy happening in the MIR of this array?
It's likely something like (Once, MaybeUninit<Mutex<[[T; N]; M]>>)
- so the big array is in the same allocation, but that's not the thing being allocated and copied around, merely an uninitialized field of it.
The InitMask
doesn't have to be that big, if the uninitialized data is a suffix of the allocation, but managing InitMask
in the presence of clever encodings isn't easy. Supporting only an uninitialized suffix might be possible?
OP's example is insufficiently minimized. A build of this program peaks at 33.6 GB:
const ROWS: usize = 100000;
const COLS: usize = 10000;
static TWODARRAY: [[u128; COLS]; ROWS] = [[0; COLS]; ROWS];
fn main() {}
Perhaps that makes sense, if we're doing a copy at all of this huge array, because that's twice the size of the array. Right?
@saethlin
In fairness to rustc, that array is 14.9 GB. If you only have 8 GB of memory, you'll never be able to run this program even if you do manage to compile it.
I do have more than 8 GiB of memory, but rustc was spending 100+ GiB, which exceeds array size by far. And even if it didn't, I would hope to be able to allocate large arrays without the compiler having to reproduce the allocation. I often compile programs locally and then deploy them on servers with much more RAM, more CPU cores, etc.
@eddyb
What you may want to do instead is Box your very large 2D array.
I neglected to note in the issue that I am aware of the possibility of heap-allocating the array. In this case, however, I was explicitly experimenting with allocating the large array statically, akin to a C declaration of static int twodarray[ROWS][COLS]
. It surprised me that attempting to do so broke the compiler, so I wanted to report it so it can hopefully get fixed or at least looked at.
It seems like replacing the InitMask (which is basically a BitSet, not sure why we have a distinct impl of it for MIR allocations?) with something like the recently-added IntervalSet may make sense -- presumably it's pretty rare for a truly random pattern of initialization to occur.
@Mark-Simulacrum I believe something like IntervalSet
used to exist and was considered too cumbersome.
Maybe the presence of a reusable abstraction might be enough to let miri use that?
In this case, however, I was explicitly experimenting with allocating the large array statically, akin to a C declaration of
static int twodarray[ROWS][COLS]
.
To what end? Accessing it won't be more efficient than a pointer to a heap area - passing a &mut [[u128; COLS]; ROWS]
around as a function argument (i.e. no globals), pointing to the heap, is probably the best you can do.
Putting it in a static
just makes the file on disk unnecessarily large, and changes whose responsibility it is (kernel vs your program) to allocate it. And mutation requires a Mutex
that you might be able to avoid otherwise.
Out of curiosity, I tried something similar (tho probably not sound on non-trivially-copyable types, heh) in C++ (using constexpr
to match Rust's behavior) and it looks like GCC is worse than Clang at this.
OP's example is insufficiently minimized. A build of this program peaks at 33.6 GB:
const ROWS: usize = 100000; const COLS: usize = 10000; static TWODARRAY: [[u128; COLS]; ROWS] = [[0; COLS]; ROWS]; fn main() {}
Perhaps that makes sense, if we're doing a copy at all of this huge array, because that's twice the size of the array. Right?
That's worse than the OP example (in terms of initialization state), I'd say this is closer:
pub static BIG: Option<[u128; 1 << 30]> = None;
For 1 << 25
instead of 1 << 30
, even godbolt can compile it (and it even works if it has to evaluate a const fn
).
Surprisingly, this is the best I can get to compile on godbolt even fully-uninitialized:
pub static BIG_UNINIT: MaybeUninit<[u128; 1 << 27]> = MaybeUninit::uninit();
(though you'll want to replace u128
with some atomic type, or use some other way to enable interior mutability, to get it to go into .bss
instead of .rodata
)
@Mark-Simulacrum I think I was mistaken to point to InitMask
as a problem. It's surely where most of the compile time is coming from, but I think the memory consumption comes from MIR just doing a copy (in the minimal case) or two (in the case of lazy_static
with a Mutex
) of the intended array.
@eddyb
I believe something like
IntervalSet
used to exist and was considered too cumbersome.
It looks to me like there is currently InitMask
and InitMaskCompressed
, and in this program, the interpreter creates a full InitMask
then attempts to do run-length encoding on it after-the-fact. I feel like this is just IntervalSet
with extra steps?
Surprisingly, this is the best I can get to compile on godbolt even fully-uninitialized:
On godbolt I see "Compiler returned: 137" which from Googling means something else sent the compiler a SIGKILL. The last build that works peaks at 2.1 GB, and of course the next power of 2 is 4.1 GB. I think this is supposed to be unsurprising, because const eval has to construct the const. I suspect there's a (quite generous) 4 GB memory limit in Godbolt. So perhaps this is "as expected, but unfortunate"?
@hniksic
I do have more than 8 GiB of memory, but rustc was spending 100+ GiB, which exceeds array size by far.
Wowee, that's significantly worse. Can you share the lazy_static
that was responsible for that amount of memory usage?
Can you share the
lazy_static
that was responsible for that amount of memory usage?
That should actually be the original code in the report. The line from top
shows the compiler having allocated 100 GiB virtual memory (and presumably not getting killed due to overcommit). Maybe the difference is that I was compiling it in release mode?
Supporting only an uninitialized suffix might be possible?
yea, we could lazily grow it and treat anything beyond the size as uninitialized.
It looks to me like there is currently InitMask and InitMaskCompressed, and in this program, the interpreter creates a full InitMask then attempts to do run-length encoding on it after-the-fact. I feel like this is just IntervalSet with extra steps?
Basically, we have InitMask
to store the mask of an allocation, and InitMaskCompressed
is used when copying the mask from one allocation to another (which might overlap, and we might be asked to repeat this N times, so we have to keep a copy of the original mask).
If it is possible, without perf loss for typical CTFE, to use the same data structure for both of these cases, that would be nice. :D But I think the problem is that testing if bit i
is set in an InitMaskCompressed
actually costs linear time since we have to scan the list of intervals.
But I think the problem is that testing if bit
i
is set in anInitMaskCompressed
actually costs linear time since we have to scan the list of intervals.
If intervals lengths were to be replaced with prefix sums of their lengths test if i-th bit is set can be performed in logarithmic time.
I tried this code:
I expected it to compile successfully.
Instead, the compiler spent all available RAM and swap, and I had to kill it. From
top
:Meta
rustc --version --verbose
: