Allocator interface: allow implementations to refuse to resize in place, even when shrinking

andrewrk commented 1 year ago

Over in #13513 I implemented a fast allocator for WebAssembly. It's nice because it's simple and easy to understand - at only 176 lines - and it takes advantage of the fact that in Zig, the free implementation has access to the allocated size. However, the requirement that shrinking an allocation (and not all the way to zero) must succeed is problematic. It requires either adding more metadata to every allocation, fragmenting memory, or doing a much more complicated memory management strategy.

In contrast, if the memory allocator were allowed to return failure for shrinking in place, it would require the user of an Allocator to keep track of the capacity when shrinking. In practice, it would mean this handy pattern does not work anymore:

pub fn utf16leToUtf8Alloc(allocator: mem.Allocator, utf16le: []const u16) ![]u8 {
    var result = try std.ArrayList(u8).init(allocator);
    defer result.deinit(); // <--- unconditionally defer the cleanup
    var it = Utf16LeIterator.init(utf16le);
    while (try it.nextCodepoint()) |codepoint| {
        const utf8_len = utf8CodepointSequenceLength(codepoint) catch unreachable;
        const new_slice = try result.addManyAsArray(utf8_len); // <--- append to the array list
        assert((utf8Encode(codepoint, new_slice) catch unreachable) == new_slice.len);
    }
    return result.toOwnedSlice(); // <--- convert to a slice that can be passed to free()
}

On the other hand, this pattern already has the possibility of error.OutOfMemory due to the append. So making toOwnedSlice() fallible would be acceptable for this use case.

The benefit of this different interface is that Allocator implementations could rely on size+alignment parameters to correctly categorize allocations - without storing any metadata.

I would argue that the single most important API user of the Allocator interface is std.ArrayList, followed by std.HashMap. Consider that in both of these cases, capacity is already tracked by the data structure, and additionally tracking it in the allocator implementation would be redundant. Furthermore, in both of these cases, they can handle failure to shrink in place better than the allocator implementation can! The allocator implementation would memcpy more bytes than necessary, since a portion of the allocation is not yet used. The data structures have this information; the allocator implementation does not.

Since this proposal is breaking, I would also propose to do the following additional breaking changes at the same time:

7952
4298
remove the "overallocation" part of the interface. I think it is too complicated for the hypothesized benefits, and to this day I haven't seen any hard evidence that it actually helps any performance metrics whatsoever.

So - the 3 functions needed to implement the interface would look like this:

fn alloc(ctx: *anyopaque, len: usize, align_exp: u8, return_address: usize) ?[*]u8
/// Resize in place request. Refusing to resize by returning `.no` is always allowed.
fn resize(ctx: *anyopaque, ptr: []u8, align_exp: u8, new_len: usize, return_address: usize) enum {ok, no}
fn free(ctx: *anyopaque, ptr: []u8, align_exp: u8, return_address: usize) void

I'm also tossing in a couple of other breaking API changes while we're at it:

In alloc, use ?[*]u8 to make the return type a single register. The Allocator interface can transform the null into error.OutOfMemory.
alignment passed as a power of two instead of u29. To get the alignment, (1 << align_exp). This makes invalid alignments unrepresentable, and avoids the question of what data type to use for alignment, given that u29 will be changing soon.

I am fast-tracking this proposal by pre-accepting it right now. However, it is still up for discussion should anyone want to raise any concerns.

Related: #12484

mlugg commented 1 year ago

Based on the discussion in #7952, would it make sense as a small extension of the alignment change for align_exp to have type std.math.Log2IntCeil(std.math.IntFittingRange(1, std.mem.page_size))? (Or whatever would be correct, my brain's on 2% capacity rn and can't do maths) That would mean that for instance on a platform with 4k pages, we'd have alignments of type u4. Obviously this doesn't rule out all invalid alignment inputs (in this case values 13-15 are still representable but invalid), but it further limits them. (If #3806 is accepted, that would allow us to truly enforce the limits at the type level)

EDIT: I think the type I was going for was std.math.IntFittingRange(0, std.math.log2(std.mem.page_size))

leecannon commented 1 year ago

I was toying around with something like this in #10256 but the problem of toOwnedSlice stopped me from ever making it non-draft,

daurnimator commented 1 year ago

the requirement that shrinking an allocation (and not all the way to zero) must succeed is problematic. It requires either adding more metadata to every allocation, fragmenting memory, or doing a much more complicated memory management strategy.

I don't necessarily agree with this statement.

The benefit of this different interface is that Allocator implementations could rely on size+alignment parameters to correctly categorize allocations - without storing any metadata.

Depending on the allocator you're writing, length and alignment may be either too much or too little metadata to work with an allocation.

There are allocators that are going to need to store additional data e.g. thread-locality info; which arena they come from; any special attributes. I say that on one level, we strip allocators back to their most basic necessary attribute: the pointer itself.

For both the simplest (i.e. a bump allocator) and complex (e.g. where you have thread-safety and other safety checks), this is desirable. This gives implementors a choice of where to store their metadata including the classic "before the allocation" or the modern+safer process-wide lookup tree/hash with thread-local caches.

The one thing you lose by not having the size is that a pure "after the allocation" metadata approach isn't possible without some tricks (like using pointer bits to indicate length.... but that doesn't really work with arbitrary alignment selection). However "after the allocation" metadata has been looked down upon from a security perspective for decades due to the high frequency of buffer overflows: a buffer overflow that reaches allocation metadata is usually the key to making it exploitable.

silversquirl commented 1 year ago

The length of an allocation is something we almost always know, especially in Zig. There's no reason not to let the allocator use it.

At worst, this saves 8 bytes per allocation in the allocator metadata. At best, it can eliminate metadata for some allocator designs entirely.

tecanec commented 1 year ago

I'm a big fan of that toOwnedSlice pattern, and I often use it for serialization, so I would very much like to keep it. I'm okay with toOwnedSlice being able to error, but that shouldn't be because of what allocator I'm using. So if toOwnedSlice can't shrink the slice, it should at least try making a copy instead.

As for the new signature of the alloc vtable function, I notice that it's sacrificing the ability to return an oversized slice. I know that ArrayLists benefit from that feature as an optimization, but I don't know how much. Other than that, I think it's cool but ugly, so I think it should keep returning error{OutOfMemory}![]u8 in the API and maybe get changed into the more efficient signature by the wrappers inside Allocator.init and let compiler optimizations take care of the conversion.

ziglang / zig

Allocator interface: allow implementations to refuse to resize in place, even when shrinking #13535

7952

4298