ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.85k stars 2.55k forks source link

Stack allocation often harder than heap allocation #13640

Open enderger opened 1 year ago

enderger commented 1 year ago

Zig Version

0.11.0-dev.184+c4f7663c9

Steps to Reproduce and Observed Behavior

  1. Create a structure which contains a pointer to itself
  2. Try to instantiate said structure without either allocating on the heap or creating an imperative list of substructures

Expected Behavior

The language heavily pushes for stack allocation to be easier than heap allocation, but currently doesn't provide any way to just allocate memory on the stack inline. This either means that you have to write out a load of disjointed variables to express recursive structures (as pointers cannot derive their mutability from their parent) or allocate large amounts of unmanaged memory on the heap. This is problematic, as you either need to write structures that take up a large heap memory and mechanistically free all memory at once (forcing far more data onto the heap than actually needed), have horribly disjointed and hard to follow initialization, or leak memory. To fix this, there probably should be a builtin which creates a pointer to a hidden variable initialized to a constant in the current scope (in other words, expanding the current stack frame and getting a pointer) so that stack based constructs are able to take advantage of struct initializer syntax without either calculating the size of the buffer needed for an allocator or having a massive number of variables.

nektro commented 1 year ago
const S = struct {
    field: i32,
    me: *S = undefined,
};

pub fn main() void {
    var the = S{ .field = 42 };
    the.me = &the;
}
enderger commented 1 year ago

More meant a case like this:

const SExpr = union(enum) {
  Atom: []const u8,
  Pair: struct {
     car: *SExpr,
     cdr: *SExpr,
  }
}

The problem arises when you try to initialize deeply nested pairs, since you now need a variable to hold every SExpr without allocating it on the heap.

enderger commented 1 year ago

Sorry for the oddly specific example, this problem has made quite a lot of trouble when trying to implement a parser for S expressions since every part of the tree has to be assigned to a named variable whose sole purpose is to hold one part without putting it on the heap.

dee0xeed commented 1 year ago

While doing some exercises I permanently bumped into similar (or not very similar, not sure) problems. With pure stack allocation we do not know actual address of an entity (and therefore addresses of all its' sub-entities) until return from .init(). This may significantly affect data structures design and also initialization sequence - they can be more simple in case of heap allocation.

SpexGuy commented 1 year ago

If it helps, you can use heap code to allocate out of a stack buffer with code like this:

const needed_sexprs = 40;
var mem: [@sizeOf(SExpr) * needed_sexprs]u8 align(@alignOf(SExpr)) = undefined;
var fba = std.heap.FixedBufferAllocator.init(&mem);
const tree = buildTree(fba.allocator()) catch @panic("stack mem not big enough");
mlugg commented 1 year ago

While doing some exercises I permanently bumped into similar (or not very similar, not sure) problems. With pure stack allocation we do not know actual address of an entity (and therefore addresses of all its' sub-entities) until return from .init(). This may significantly affect data structures design and also initialization sequence - they can be more simple in case of heap allocation.

There are proposed language features to deal with this issue. Improved result location semantics (#2765) will allow you to access the return address of a function, meaning you effectively get a pointer to it for free, and pinned structs (#7769) will allow you to enforce that the struct is never directly copied (to make sure the result location stuff is being used properly and the value is never accidentally copied, invalidating the pointers). Until those are implemented, of course, you can just pass out pointers into init functions if needed.

dee0xeed commented 1 year ago

Improved result location semantics (https://github.com/ziglang/zig/issues/2765) will allow you to access the return address of a function, meaning you effectively get a pointer to it for free

I guess this is organized in a way similar to something like this:

Hence addresses are known from the very beginning and are valid until the caller returns. Did I get the idea right?

mlugg commented 1 year ago

\<snip> Did I get the idea right?

Kind of, yeah. What happens is that the caller passes as an implicit argument a pointer to the memory the return value should occupy - this actually probably already happens internally in basically any standard imperative language, it's also how most (all?) C calling conventions handle large return values.

Passing a pointer rather than just implicitly writing to memory below the stack frame allows result locations to be chained through multiple calls. For instance, if you have this code:

fn f() MyStruct {
    return .{
        // ...
    };
}

fn g() MyStruct {
    return f();
}

pub fn main() void {
    const foo = g();
    // ...
}

Result location semantics guarantee zero intermediate copies, since g and then f will be internally passed a pointer to foo's stack memory. The improved semantics would allow f to directly access that pointer, likely through a builtin, so foo could be self-referential.