Open IntegratedQuantum opened 7 months ago
This replicates the issue directly on std.heap.page_allocator
:
const std = @import("std");
const page_allocator = std.heap.page_allocator;
var pages: [128 * 1024]*[4096]u8 = undefined;
pub fn main() !void {
for (&pages) |*page| {
page.* = try page_allocator.create([4096]u8);
}
for (pages, 0..) |page, i| {
if (i & 1 == 0) {
continue;
}
page_allocator.destroy(page);
}
}
(Exact number of pages needed may need to be tweaked based on your system configuration)
The problem is with Zig's posix implementation of munmap()
:
/// Deletes the mappings for the specified address range, causing
/// further references to addresses within the range to generate invalid memory references.
/// Note that while POSIX allows unmapping a region in the middle of an existing mapping,
/// Zig's munmap function does not, for two reasons:
/// * It violates the Zig principle that resource deallocation must succeed.
/// * The Windows function, VirtualFree, has this restriction.
pub fn munmap(memory: []align(mem.page_size) const u8) void {
switch (errno(system.munmap(memory.ptr, memory.len))) {
.SUCCESS => return,
.INVAL => unreachable, // Invalid parameters.
.NOMEM => unreachable, // Attempted to unmap a region in the middle of an existing mapping.
else => unreachable,
}
}
The documentation clearly suggests a model where each call to mmap()
creates a new mapping, and so as long as you call munmap()
with the same bounds as each mmap()
, it cannot fail. Unfortunately, this is incorrect, at least on Linux: when allocating anonymous memory with mmap()
, the kernel tries to allocate a region of address space adjacent to an existing mapping, and will opportunistically merge with that mapping wherever possible. The result is that most calls to mmap()
only extend an existing mapping, not create a new one, and thus also most paired calls to munmap()
are in fact unmapping part of a mapping,
Unfortunately this means that in general memory deallocation on Linux can fail and this needs to be worked around by userspace. I believe the usual approach is to keep track of regions that you've failed to unmap so you can coalesce them with new unmap requests until you either reach a big enough region that it can be unmapped without splitting a mapping, or else unrelated unmap requests get the system away from the vm.max_map_count
limit. The problem, of course, is that coalescing is only robust as a solution if regions are being coalesced in a single place for the whole process, and not separate places for Zig and for libc's allocator and whatever other libraries are in play that might be directly performing munmap()
calls (and probably just permanently leaking their regions if an error occurs).
@klkblake thank you for this breakdown and analysis. This is maddening, and I will need to go through the 5 stages of grief before suggesting a course of action.
Zig Version
0.12.0-dev.2150+63de8a598 (linux)
Steps to Reproduce and Observed Behavior
I observed that my application would sometimes crash with an
OutOfMemory
error. This didn't make any sense to me. In total my system was reporting just around 40% memory usage, and additionally the error would happen during a phase where a lot of memory was freed. There were also no large allocations happening and the allocation where it crashed was a mere 24 bytes.I could trace the error back to the
mmap
call in the page allocator. From the linux documentation, one of the sources for out of memory is this:This gave me some clues to make a simple reproducible:
Output:
Expected Behavior
From a general purpose allocator I expect it to be able to fully use the memory the system can provide(minus internal fragmentation of course). The
c_allocator
doesn't have this problem, because I think it doesn't unmap pages as aggressively as the GPA.