Proposal: Pinned Structs

SpexGuy commented 3 years ago

This proposal supersedes #3803 and #3804 as the plan going forward for pinned structs, as discussed in the design meetings.

Motivation

A common pattern in Zig is to embed a struct in a larger struct, pass pointers to the inner struct around, and then use @fieldParentPtr to retrieve the outer struct. This is central to our current favored approach to runtime polymorphism. However, it's easy to accidentally make a value copy of inner structs like this, causing hard to debug memory corruption problems or segfaults.

A similar problem exists with types that may contain pointers to data within themselves. As an example, consider an array with an inline stack buffer. Copying this value would yield a broken object, with a pointer pointing into the previous instance.

Finally, at the extreme end of this is async frames. Async frames may contain pointers into themselves, and so cannot safely be copied.

Where possible, the compiler should give an error if you try to copy any of these types. This is also important for the current plan for Result Location Semantics (see #2765), which is very implicit. In the case of these types, it's extremely important to know if RLS is working (and to fail the compile if not).

Proposal

Introduce the pinned keyword. This keyword can appear as a modifier to an anonymous type declaration, similar to packed and extern. pinned may coexist with packed or extern, and is compatible with struct and union types. It is not allowed (or ever needed) on opaque, because opaque values cannot be copied. It is also not allowed on enum or error, because those types cannot contain self pointers.

If a type is pinned, it means that the address of the type is part of its value, and the compiler is not allowed to copy an instance to a new location. But braced initialization is still allowed, and the address may be forwarded through result location semantics.

Any composite type which contains a field which is pinned is also considered pinned, and the same restrictions apply. This includes error unions and optionals, so #2761 will also be important for working with these types.

Frame types will also be considered pinned.

It's important to note that, despite some superficial similarities, pinned is not related to RAII. Pinned will not enforce that your deinitialization code is run, nor will it prevent you from overwriting fields that needs to be cleaned up, or even whole structs with braced initializers. It only prevents you from accidentally relocating an existing object.

Examples

const CAllocator = pinned struct {
    allocFn: fn(*CAllocator, usize) *c_void,
    freeFn: fn(*CAllocator, *c_void) void,
};

const CountingAllocator = struct {
    // CountingAllocator is pinned because of this field
    allocator: CAllocator = .{
        .allocFn = allocFn,
        .freeFn = freeFn,
    },
    count: usize = 0,
    inner: *CAllocator,

    // we can return "by value" here because of RLS,
    // this is not actually a copy.
    pub fn init(inner_allocator: *CAllocator) CountingAllocator {
        // this is braced initialization of the result location,
        // which is allowed.
        return .{ .inner = inner_allocator };
    }

    pub fn allocFn(super: *CAllocator, size: usize) *c_void {
        const self = @fieldParentPtr(CountingAllocator, super, "allocator");
        self.count += 1;
        return self.inner.allocFn(self.inner, size);
    }
    pub fn freeFn(super: *CAllocator, ptr: *c_void) void {
        const self = @fieldParentPtr(CountingAllocator, super, "allocator");
        self.count -= 1;
        self.inner.freeFn(self.inner, ptr);
    }
};

fn foo() void {
    // this is fine, it's not a copy.  The result location is forwarded to init.
    const alloc = CountingAllocator.init();

    // compile error: copies pinned type from temporary
    const alloc2 = CountingAllocator.init().allocator;

    // this is ok (for now), the stack temporary is passed
    // as the result location and then its address is taken,
    // no copy occurs
    const alloc3 = &CountingAllocator.init().allocator;

    // braced initialization is fine
    const alloc4 = CountingAllocator{ .inner = alloc };

    // but value copies are not, this is a compile error
    const alloc5 = alloc4;

    // this is fine, it just forwards the result location to bar.
    var frame: @Frame(baz) = bar();
}

fn bar() @Frame(baz) {
    // this is fine, it forwards the result location.
    return async baz();
}
fn bar2() @Frame(baz) {
    // this is fine with #2765, frame is the result location.
    var frame = async baz();
    return frame;
}
fn bar3() @Frame(baz) {
    // this is not ok, NRLS does not apply to conditionals
    // like this so each of these returns is generating a value
    // copy of the frame, which fails because it is copying pinned.
    var frame1 = async baz();
    var frame2 = async baz();
    if (something) return frame1;
    return frame2;
}

Other Notes

Though we have chosen here to put pinned on type declarations, for some uses it is actually a property of values. For example, an allocator with no attached state would not need to be pinned. However, introducing pinned as a new CV-qualifier seems like a much more invasive change, and it's not clear that it's worth it.

LemonBoy commented 3 years ago

From your description I get the pinned attribute propagates "upwards" trough the container types, I guess the same is valid for optional types (even though those are not containers).

One more use-case to take into account, given a pinned value y: ?T doing if (y) |x| becomes a compile error due to the implicit copy into x.

tecanec commented 3 years ago

I like this addition a lot. Knowing that some structs will stay where I put them is going to be great in a lot of places.

However, I must note that this addition contributes to the vaugeness of the = operator. There is now an even wider distinction between computing a value before assigning it versus assigning it before computing it, and knowledge thereof is becomming more and more neccesary to properly understand what's going on. I'm not sure if this is enough to warrant a new syntax, but I think that we should at the very least make a clear and logical distinction between the two in the docs that applies whenever this is revelant.

ityonemo commented 3 years ago

Should "pinned" be an attribute on the type or an attribute on the field?

tau-dev commented 3 years ago

Would we really a new CV-qualifier to enable per-value pinning? It seems that can be easily covered if pinned was not just a modifier on anonymous types, but an operator on any applicable type:

const AlwaysPinned = pinned struct {
    a: usize
};
const MaybePinned = struct {
    a: usize
};

var x: AlwaysPinned;
var y: MaybePinned;
var z: pinned MaybePinned; // pinned just like x

SpexGuy commented 3 years ago

attribute on the field

This would require CV qualifiers. Here's the case that is protected on types but not on fields:

fn foo(a: Allocator) void;
foo(std.heap.page_allocator.*);

type operator

What is the relationship between the value types T and pinned T? Coercion requires a copy, so you could add pinned but not remove it. Similarly with pointers, we would allow coercions that add pinned but not those that remove it. But now, under this system, if you ever use the type *Allocator, you have introduced a bug. Your library is now incompatible with most allocators, because they tend to be pinned. Being pinned is part of the Allocator interface, even though some implementations may not need it.

tau-dev commented 3 years ago

Your library is now incompatible with most allocators, because they tend to be pinned.

For convenience, the standard library should of course make definitions such as pub const Allocator = pinned MovableAllocator. Still no reason for qualifiers, as far as I understand.

given a pinned value y: ?T doing if (y) |x| becomes a compile error due to the implicit copy into x.

Couldn't x just alias y (if T is a pointer) or y.value (otherwise)?

batiati commented 3 years ago

If I may suggest, my two cents ...

I think that the pinned keyword fits well from the "memory" perspective; From the "programmer" perspective, something like ref or byref would be more self-explanatory.

Example:

//introduced pinned modifier on struct 
const MyStruct = pinned struct {
    //...
};

I think that this special kind of struct should behave just like a regular one, except it cannot be copied; It works much like opaque types that are allowed only as a reference, except that for pinned struct it's ok to have its value stored somewhere during the initialization.

Example:


// Wrong, declaring by value must result in compile error
fn doSomething(arg: MyStruct) void {}

// Wrong, returning by value must result in compile error
fn returnSomething() MyStruct {}

// Corret, allocating on the heap, returning by reference
fn init(allocator: *Allocator) *MyStruct {

    var ret = allocator.create(MyStruct);
    ret.* = .{
        // initialization ...
    };
    return ret;
}

const MyContainer = struct {

    // It is ok to hold a pinned struct's value as a field
    child: MyStruct,

};

fn main() void {

    // parent holds a pinned field
    var parent = MyContainer { ... };

    // Correct, must be by ref
    var child = &parent.child;

    // Wrong, trying to copy, compile error
    var bad = parent.child;

    // Corret, storing on the stack, but referring as a pointer
   // this pointer will not live after the function's end.
    var myValue = &MyStruct { 
        // initialization ...
   };
}

Question:

Should a parent struct holding a pinned struct be pinned too? If yes, should it be implicit pinned (like in Rust), or have a compile error if it is not marked as pinned?

Example:


// This struct must be pinned too?
const MyContainer = struct {

    // Holds a pinned struct
    child: MyStruct,

};

// It's ok for this struct not be pinned
const AnotherContainer = struct {

    // Holds just a reference
    child: *MyStruct,

};

And finally, the built-in function @fieldParentPtr must accept only pinned structs as an argument, avoiding invalid references.

SpexGuy commented 3 years ago

pub const Allocator = pinned MovableAllocator

Sure, that makes sense, but what is the use case for MovableAllocator? Why would anyone ever need to use it? Why does it need to coerce to Allocator? Introducing a modifier on all types (including pointers and others) is a pretty invasive change, arguably more complicated than adding a new CV qualifier. It needs solid use cases to back it up.

If I may suggest, my two cents ...

This is pretty much the behavior as described above. Pinned is specifically about memory, not the programmer perspective, so IMO that name fits better. There is one example that's different though:

// Wrong, returning by value must result in compile error
fn returnSomething() MyStruct {}

Zig uses result location semantics (#287, #2765, #2761) for struct return values, so this does not generate a copy and is allowed. Similarly, the semantic model for value parameters allows the compiler to pass them as pointers. We may consider in the future an extension that forces the compiler to pass pinned structs as constrained pointers, which would allow that example to compile as well.

Should a parent struct holding a pinned struct be pinned too?

Yes, as noted in the proposal, structs containing pinned fields are implicitly also pinned.

the built-in function @fieldParentPtr must accept only pinned structs as an argument

I'm not convinced of this, I think there are valid uses for @fieldParentPtr that do not require pinned types.

tau-dev commented 3 years ago

what is the use case for MovableAllocator?

as you said,

some implementations may not need [to be pinned]

Otherwise, it would not need to be public.

Introducing a modifier on all types [is] arguably more complicated than adding a new CV qualifier

I don't know very much about the compiler internals, but there seem to be around 70 base types and three qualifiers, which led me to the assumption that the former set would be easier to extend.

A disadvantage of making pinned a type would be that nested pointers could not be coerced; I do not know whether this matters in practical use cases.

kyle-github commented 3 years ago

Just reread this proposal. One thing kind of jumps out from the motivation section:

A common pattern in Zig is to embed a struct in a larger struct, pass pointers to the inner struct around, and then use @fieldParentPtr to retrieve the outer struct. This is central to our current favored approach to runtime polymorphism. However, it's easy to accidentally make a value copy of inner structs like this, causing hard to debug memory corruption problems or segfaults.

A similar problem exists with types that may contain pointers to data within themselves. As an example, consider an array with an inline stack buffer. Copying this value would yield a broken object, with a pointer pointing into the previous instance.

Finally, at the extreme end of this is async frames. Async frames may contain pointers into themselves, and so cannot safely be copied.

I highlighted the parts that I, finally, noticed. From the programmer perspective, this is about copying values. Shouldn't it be called no_copy or something similar? This seems a lot simpler to understand. Then all the rest seems to fall out naturally. If the compiler can prove RLS will work, then it is not a copy.

The capture case that @LemonBoy brought up is a good one.

It is a minor nit, but it seems to make it more clear what you are trying to do: stop anyone/anything from making a copy. What am I missing here? Is there some other aspect to pinned that will be used later?

SpexGuy commented 3 years ago

While I agree that nocopy is a better description of how the attribute behaves, I think it conveys more poorly where this attribute should be used. For example, looking at the name alone, you may be tempted to put the nocopy attribute on a large struct, because copying it may be expensive. Or you might want to put it on a data structure which in other languages would be move-only. However this would be a misuse of the feature. Personally I think pinned better conveys the actual intent of the feature - it should only be used when the address of an instance is a meaningful part of its value.

However I do agree that this is not a clear-cut decision, I think @andrewrk will need to weigh in to decide this.

kyle-github commented 3 years ago

pinned -> do not move something which is meaningful in Rust. But does not really say anything about copying in the common usage of the term.

no copy -> clearly means do not/can not copy. But does not really say anything about moving (which would also need to be forbidden).

These seem orthogonal to me.

But this is devolving into bikeshedding so I will stop on that topic.

The proposal as a whole seems fairly good. I think there are a few things that are not clear to me or seem to make this a bit trickier than apparent:

Anything containing a pinned type is itself pinned. To me this means that while thus obvious that pinned or not pinned becomes a critical part of the "API" of a type, it seems like it could lead to some remote footgun behavior. Change one of your types to pinned and someone, potentially far away (chained imports), starts getting compile errors. But due to usage, perhaps no code in between does.
I think that being explicit and having pinned be required for any type that includes another pinned type would be useful for later code reading. That helps with the above problem.
Captures, as noted by @LemonBoy, become less usable. I think this also extends to try and catch. If that is the case then a lot of common code patterns are not usable with pinned types.
Are there other ways to solve each of the main problems (well stated, BTW) from the motivation?

One of the things I really like about Zig's philosophy is that the intent is to guide the programmer into doing the right thing without forcing guard rails. In this case, it makes me wonder if the Allocator interface is actually exposing a different problem: maybe it is not the right way to do things? For example, there is a proposal about partial structs, #7969, that would also allow similar ease of coding and use IIUC. It also solves a number of other problems as noted in the #7969 proposal. I am not saying that the other proposal should be done, but it does solve at least one of the motivating problems of this proposal.

daurnimator commented 3 years ago

While I agree that nocopy is a better description of how the attribute behaves, I think it conveys more poorly where this attribute should be used.

I agree with @SpexGuy: pinned is less likely to be used by accident or with the wrong intentions.

batiati commented 3 years ago

Let me ask something:

Yes, as noted in the proposal, structs containing pinned fields are implicitly also pinned.

Why do we need to enforce pinned across all hierarchy? I mean, for example:


    var gpa1 = std.heap.GeneralPurposeAllocator(.{}) {};
    var allocator1 = &gpa1.allocator;
    // OK here
    var mem1 = allocator1.alloc(u8, 32);

    // Copy the parent struct,
    var gpa2 = gpa1;
    var allocator2 = &gpa2.allocator;

    // But it's OK, allocator2 still valid
    var mem2 = allocator2.alloc(u8, 32);

    // Wrong, copy the pinned type, segfault here!
    var allocator3 = gpa1.allocator;
    var mem3 = allocator3.alloc(u8, 32);

Why not leave to the programmer set pinned just where needed? I see that Rust has a very strict requirement for lifetimes checks, but I think that is not Zig's case.

jumpnbrownweasel commented 3 years ago

Here's a quick reaction after reading through this: Since the polymorphism/@fieldParentPtr use case is the immediate problem, perhaps a pinned modifier for struct fields only, would be sufficient as a starting point. That way the struct with the fn that uses @fieldParentPtr is also responsible for pinning the corresponding Allocator field, and different uses of Allocator would not be impacted. If I've missed something crucial about why this won't work, then please just ignore this and no need for a detailed reply.

ikskuh commented 3 years ago

@jumpnbrownweasel pinned memory area are also important when you store "up-pointers" from children to its parents or similar.

jumpnbrownweasel commented 3 years ago

@jumpnbrownweasel pinned memory area are also important when you store "up-pointers" from children to its parents or similar.

Yes, I understand. I got the impression reading this that there are some unresolved and complex design issues for the general case, and I was just suggesting that a pinned modifier for fields is all that is necessary for the @fieldParentPtr case, if a partial solution might make sense initially.

EDIT, added: For a struct type that is pinned because it contains self-referencing pointers, it makes sense that containing structs would also be pinned. But this restriction isn't necessary when pinning is to prevent the problem with @fieldParentPtr. These seem to be two different problems, which is also a reason I thought that @fieldParentPtr could be solved with a field modifier instead.

Hadron67 commented 3 years ago

What about types that are "conditionally" pinned? For example, ?T is also pinned if T is pinned, but it's always OK to copy a null. Same applies to tagged union, where only some tag payloads are pinned. It may even be legal to copy a pinned type if the value is undefined. So maybe we need something like @unpin() that casts away the pin qualifier as an escape hatch?

marler8997 commented 3 years ago

So maybe we need something like @unpin() that casts away the pin qualifier as an escape hatch?

Interesting. I'd extend the question to a union type where some of the cases are pinned and some aren't. Here's the behavior I would expect:

const Variant = union(enum) {
    allocator: Allocator, // this is pinned
    num: usize, // this is not pinned
    // ...
};

var variant = Variant { ... };

const v2 = variant; // compile error, variant as a whole is pinned because allocator is pinned
const n = variant.num; // ok
const a = variant.allocator; // compile error, cannot copy a pinned Allocator

switch (variant) {
    .allocator => ...,  // adding a capture |allocator| would be a compile error because it's a copy, but |*allocator| is ok
    .num => |x| ...,  // this is ok
}

I don't really see any problematic cases here though. I also couldn't come up with a problematic case for ?T. @Hadron67 are you able to come up with any?

Hadron67 commented 3 years ago

@marler8997 A potential problematic use case I can come up with is when working with container that does reallocation, like ArrayList or HashMap. Since reallocation may involves copy so will invalidate previous elements, calling insertion function will give a compile error if the value is pinned. But it should be ok if you append null, undefined, or union tag with non-pin payload to the container until you initialized or modified any of the elements to pinned value. Although this can be solved with initCapacity for simple containers such as ArrayList or HashMap, it's not the case for more complicated containers, like StringHashMap(StringHashMap(V)).

However I actually can't think of a satisfying workaround for this case other than not marking the value type as pinned (Edit: can be partially solved using [@sizeOf(V)] align(@alignOf(V)) u8), and I don't know how something like @unpin could help. Given that C++ has similiar problems with e.g. std::vector<std::atomic<int>>, I think this may be a tradeoff of struct pinning.

iacore commented 3 years ago

I feel like there is no good way to implement this, since one may want to copy the parent struct. It's probably better for interface types to contain a pointer to the concrete type.

const Named = struct {
    inner: *const c_void,
    getNameFn: fn (*const c_void) []const u8,

    pub fn getName(self: @This()) []const u8 {
        return self.getNameFn(self.inner);
    }
};

const Dog = struct {
    name: []const u8,

    fn asNamed(self: *const Dog) Named {
        return .{ .inner = @ptrCast(*const c_void, self), .getNameFn = Dog_Named_getName };
    }
};

fn Dog_Named_getName(self: *const c_void) []const u8 {
    return @ptrCast(*const Dog, @alignCast(@alignOf(Dog), self)).name;
}

pub fn main() !void {
    const std = @import("std");
    var obj = Dog{ .name = "woof" };
    const named = obj.asNamed();
    std.log.info("{s}", .{named.getName()});
}

ghost commented 3 years ago

It's probably better for interface types to contain a pointer to the concrete type.

I think so too, especially in light of the discovery (https://github.com/ziglang/zig/issues/10037#issuecomment-952205513) that this method seems to be friendlier to optimization (devirtualization). Not to mention the other advantage of this approach: that the implementation type doesn't need to embed or "know about" the interface.

Is it time to stop calling fieldParentPtr the "favored approach to runtime polymorphism"?

kcbanner commented 1 year ago

This would be useful when dealing with platform specific contexts (ie. ucontext_t). For example, on macos:

pub const ucontext_t = extern struct {
    onstack: c_int,
    sigmask: sigset_t,
    stack: stack_t,
    link: ?*ucontext_t,
    mcsize: u64,
    mcontext: *mcontext_t,
    __mcontext_data: mcontext_t,
};

It's a footgun since you can trivially copy the linux version of this structure, but on macos the mcontext pointer has to point to wherever __mcontext_data is. I discovered this when I accidentally relocated one of these structures, and was trampling over my stack elsewhere.

ziglang / zig