ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.85k stars 2.48k forks source link

Reduce address space waste of std.heap.page_allocator on Windows #17413

Open Mikastiv opened 11 months ago

Mikastiv commented 11 months ago

Currently, the PageAllocator does allocations aligned with std.mem.page_size (4KB on Windows) using VirtualAlloc. Windows requires it's allocations to be aligned on 64KB boundaries, so everytime this allocator is used, up to 60KB of address space is wasted.

I made a small program to illustrate. All the addresses are 64KB apart (except the first one):

const alloc = std.heap.page_allocator;

for (0..16) |_| {
    const block = try alloc.alloc(u8, 4096);
    std.debug.print("{*}\n", .{block});
}

Outputs:

u8@2babf240000
u8@2babf360000
u8@2babf370000
u8@2babf380000
u8@2babf390000
u8@2babf4a0000
u8@2babf4b0000
u8@2babf4c0000
u8@2babf4d0000
u8@2babf4e0000
u8@2babf4f0000
u8@2babf500000
u8@2babf510000
u8@2babf520000
u8@2babf530000
u8@2babf540000

A simple solution would be to use Windows' allocation granularity instead of the page size, although this would commit at least 64KB every allocations instead of 4KB.

matu3ba commented 11 months ago

According to https://stackoverflow.com/questions/20023446/is-virtualalloc-alignment-consistent-with-size-of-allocation for context "64KB is the value of SYSTEM_INFO.dwAllocationGranularity". 4 KB is the page size and allocations are only aligned to full page sizes.

It looks like Windows uses memory overcommit only for stack sizes and not for heap allocations, see https://superuser.com/questions/1194263/will-microsoft-windows-10-overcommit-memory.

this would commit at least 64KB every allocations instead of 4KB.

Memory is only commited, if the page is accessed, not before. See also https://stackoverflow.com/questions/20023446/is-virtualalloc-alignment-consistent-with-size-of-allocation.

I made a small program to illustrate. All the addresses are 64KB apart (except the first one):

This does not specify how this is a problem for your use case. Allocators are expected to do batching for performance. Can you justify, why you need control over the default amount of reserved pages?

expikr commented 11 months ago

also https://devblogs.microsoft.com/oldnewthing/20210510-00/?p=105200

notcancername commented 11 months ago

This does not specify how this is a problem for your use case. Allocators are expected to do batching for performance. Can you justify, why you need control over the default amount of reserved pages?

It is generally assumed that std.heap.page_allocator allocates at std.mem.page_size granularity (obviously) . Personally, I view this as the problem. std.heap.PageAllocator ought to have a preferred_length comptime field that specifies the ideal length of an allocation in pages.

Mikastiv commented 11 months ago

Can you justify, why you need control over the default amount of reserved pages?

No, I was playing around with std.heap.page_allocator and noticed that the allocations were always aligned to 64KB

std.heap.PageAllocator ought to have a preferred_length comptime field that specifies the ideal length of an allocation in pages.

I like this idea

squeek502 commented 11 months ago

@Mikastiv why was this closed?

Mikastiv commented 11 months ago

I didn't get the feeling that it needed to be addressed by the answers I got and there hasn't been any posts since last week

squeek502 commented 11 months ago

I think it's worth keeping open if you don't mind. When combined with https://github.com/ziglang/zig/issues/17377, it is behavior that will end up mattering since the GeneralPurposeAllocator intentionally tries to avoid re-using virtual addresses and this caveat will make Windows exhaust the virtual address space more quickly.

jnordwick commented 11 months ago

GeneralPurposeAllocator intentionally tries to avoid re-using virtual addresses

Isn't this going to blow out the TLB. It is very serious issue with some processes with a large working set, especially when huge pages aren't in use.

squeek502 commented 11 months ago

Isn't this going to blow out the TLB. It is very serious issue with some processes with a large working set, especially when huge pages aren't in use.

AFAIK it's primarily a strategy intended for catching double frees. When https://github.com/ziglang/zig/issues/12484 is addressed, it almost certainly won't be part of the release-mode GeneralPurposeAllocator.

matu3ba commented 8 months ago

I think I am running into this when testing dynamic library loads, although at least partially due to Windows SHENNANIGAN of mapping libraries not intialized into memory even though an error occurs on usage of process mitigation. See https://github.com/matu3ba/win32k-mitigation/issues/1.

I'll have a look tomorrow with tooling. UPDATE: WPA shows me this behavior although I'm very surprised that <4MB virtual memory is the virtual memory limit (used this nice tutorial https://learn.microsoft.com/en-us/cpp/build-insights/tutorials/vcperf-and-wpa?view=msvc-170 https://learn.microsoft.com/en-us/windows-hardware/test/wpt/memory-footprint-optimization-exercise-2) and

reg add "HKLM\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\child_ntdll_only.exe" /v TracingFlags /t REG_DWORD /d 1 /f
reg delete "HKLM\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\child_ntdll_only.exe"

to get ~4MB virtual memory usage

20231230LoadLibrary_virtualalloc_issue

I'll investigate more.

This is the trace of the C program (<0.5MB virtual memory usage):

20231230LoadLibrary_calloc_noissue

Most likely the process or job limit https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-jobobject_extended_limit_information or https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-jobobject_basic_limit_information is too low.

UPDATE This issue is unrelated due to incorrect usage, abi problem or another problem on my side. The Windows behavior is however still very interesting and makes me question the robustness of the implementation.

nevakrien commented 5 months ago

related issue I am getting weird print statment from this allocator on a caught error that I belive is the page space

C:\Users\Owner\Desktop>oom.exe
Allocating memory until a crash...
Total memory allocated: 26280 megabytes error.Unexpected: GetLastError(1455): The paging file is too small for this operation to complete.

C:\Users\Owner\AppData\Local\Microsoft\WinGet\Packages\zig.zig_Microsoft.Winget.Source_8wekyb3d8bbwe\zig-windows-x86_64-0.11.0\lib\std\os\windows.zig:1560:49: 0x7ff7807a1a50 in VirtualAlloc (oom.exe.obj)
            else => |err| return unexpectedError(err),
                                                ^
C:\Users\Owner\AppData\Local\Microsoft\WinGet\Packages\zig.zig_Microsoft.Winget.Source_8wekyb3d8bbwe\zig-windows-x86_64-0.11.0\lib\std\heap\PageAllocator.zig:24:36: 0x7ff7807a187a in alloc (oom.exe.obj)
        const addr = w.VirtualAlloc(
                                   ^
C:\Users\Owner\AppData\Local\Microsoft\WinGet\Packages\zig.zig_Microsoft.Winget.Source_8wekyb3d8bbwe\zig-windows-x86_64-0.11.0\lib\std\mem\Allocator.zig:225:53: 0x7ff7807a1fad in allocBytesWithAlignment__anon_3940 (oom.exe.obj)
    const byte_ptr = self.rawAlloc(byte_count, log2a(alignment), return_address) orelse return Error.OutOfMemory;
                                                    ^
C:\Users\Owner\AppData\Local\Microsoft\WinGet\Packages\zig.zig_Microsoft.Winget.Source_8wekyb3d8bbwe\zig-windows-x86_64-0.11.0\lib\std\mem\Allocator.zig:105:62: 0x7ff7807a12f6 in create__anon_3182 (oom.exe.obj)
    const ptr: *T = @ptrCast(try self.allocBytesWithAlignment(@alignOf(T), @sizeOf(T), @returnAddress()));
                                                             ^
C:\Users\Owner\Desktop\oom.zig:21:41: 0x7ff7807a105a in main (oom.exe.obj)
        const newNode = allocator.create(Node) catch { //|err| {
                                        ^
C:\Users\Owner\AppData\Local\Microsoft\WinGet\Packages\zig.zig_Microsoft.Winget.Source_8wekyb3d8bbwe\zig-windows-x86_64-0.11.0\lib\std\start.zig:339:65: 0x7ff7807a1741 in WinStartup (oom.exe.obj)
    std.os.windows.kernel32.ExitProcess(initEventLoopAndCallMain());
                                                                ^
???:?:?: 0x7fff5b397343 in ??? (KERNEL32.DLL)
???:?:?: 0x7fff5b6a26b0 in ??? (ntdll.dll)
error.Unexpected: GetLastError(1455): The paging file is too small for this operation to complete.

C:\Users\Owner\AppData\Local\Microsoft\WinGet\Packages\zig.zig_Microsoft.Winget.Source_8wekyb3d8bbwe\zig-windows-x86_64-0.11.0\lib\std\os\windows.zig:1560:49: 0x7ff7807a1a50 in VirtualAlloc (oom.exe.obj)
            else => |err| return unexpectedError(err),
                                                ^
C:\Users\Owner\AppData\Local\Microsoft\WinGet\Packages\zig.zig_Microsoft.Winget.Source_8wekyb3d8bbwe\zig-windows-x86_64-0.11.0\lib\std\heap\PageAllocator.zig:24:36: 0x7ff7807a187a in alloc (oom.exe.obj)
        const addr = w.VirtualAlloc(
                                   ^
C:\Users\Owner\AppData\Local\Microsoft\WinGet\Packages\zig.zig_Microsoft.Winget.Source_8wekyb3d8bbwe\zig-windows-x86_64-0.11.0\lib\std\mem\Allocator.zig:225:53: 0x7ff7807a22dd in allocBytesWithAlignment__anon_3941 (oom.exe.obj)
    const byte_ptr = self.rawAlloc(byte_count, log2a(alignment), return_address) orelse return Error.OutOfMemory;
                                                    ^
C:\Users\Owner\AppData\Local\Microsoft\WinGet\Packages\zig.zig_Microsoft.Winget.Source_8wekyb3d8bbwe\zig-windows-x86_64-0.11.0\lib\std\mem\Allocator.zig:105:62: 0x7ff7807a13b6 in create__anon_3183 (oom.exe.obj)
    const ptr: *T = @ptrCast(try self.allocBytesWithAlignment(@alignOf(T), @sizeOf(T), @returnAddress()));
                                                             ^
C:\Users\Owner\Desktop\oom.zig:24:42: 0x7ff7807a10bf in main (oom.exe.obj)
                leaker = allocator.create(u8) catch {
                                         ^
C:\Users\Owner\AppData\Local\Microsoft\WinGet\Packages\zig.zig_Microsoft.Winget.Source_8wekyb3d8bbwe\zig-windows-x86_64-0.11.0\lib\std\start.zig:339:65: 0x7ff7807a1741 in WinStartup (oom.exe.obj)
    std.os.windows.kernel32.ExitProcess(initEventLoopAndCallMain());
                                                                ^
???:?:?: 0x7fff5b397343 in ??? (KERNEL32.DLL)
???:?:?: 0x7fff5b6a26b0 in ??? (ntdll.dll)
Memory has been released.

C:\Users\Owner\Desktop>

the code is

const std = @import("std");

const Node = struct {
    next: ?*Node,
    data: [1048576 - @sizeOf(?*Node)]u8,
};

comptime {
    std.debug.assert(@sizeOf(Node) == 1048576);
}

pub fn main() !void {
    const allocator = std.heap.page_allocator;
    var head: ?*Node = null;
    var current: ?*Node = null;
    var nodeCount: usize = 0;
    var leaker: ?*u8 = null; //we have to leak since pointers are more than a byte

    std.debug.print("Allocating memory until a crash...\n", .{});
    while (true) {
        const newNode = allocator.create(Node) catch { //|err| {
            //std.debug.print("Failed to create node, system possibly OOM. Error: {}\n", .{ err });
            while (true) {
                leaker = allocator.create(u8) catch {
                    break;
                };
            }
            break;
        };

        newNode.next = null; // Set next pointer to null after allocation

        if (head == null) {
            head = newNode;
        } else {
            current.?.next = newNode;
        }
        current = newNode;
        nodeCount += 1;

        std.debug.print("\rTotal memory allocated: {d} megabytes ", .{nodeCount});
    }

    // Cleanup, freeing all nodes
    while (head) |node| {
        const next = node.next;
        allocator.destroy(node);
        head = next;
    }
    std.debug.print("Memory has been released.\n", .{});
}