ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
32.85k stars 2.4k forks source link

ability to annotate functions which allocate resources, with a way to deallocate the returned resources #782

Closed andrewrk closed 1 year ago

andrewrk commented 6 years ago

This pattern is extremely common in zig code:

    const err_pipe = try makePipe();
    errdefer destroyPipe(err_pipe);

    var in_file = try os.File.openRead(allocator, source_path);
    defer in_file.close();

    var atomic_file = try AtomicFile.init(allocator, dest_path, mode);
    defer atomic_file.deinit();

    var direct_allocator = std.heap.DirectAllocator.init();
    defer direct_allocator.deinit();

    var arena = std.heap.ArenaAllocator.init(&direct_allocator.allocator);
    defer arena.deinit();

Generally:

    const resource = allocateResource();
    defer deallocateResource(resource); // or errdefer

This proposal is to

Strategy:

The above code example becomes:

    const err_pipe = errclean try makePipe();
    var in_file = clean try os.File.openRead(allocator, source_path);
    var atomic_file = clean try AtomicFile.init(allocator, dest_path, mode);
    var direct_allocator = clean std.heap.DirectAllocator.init();
    var arena = clean std.heap.ArenaAllocator.init(&direct_allocator.allocator);

How to annotate cleanup functions:

// std.mem.Allocator
fn create(self: &Allocator, comptime T: type) !&T
    cleanup self.destroy(_)
{
    const slice = try self.alloc(T, 1);
    return &slice[0];
}

// function pointer field of struct
allocFn: fn (self: &Allocator, byte_count: usize, alignment: u29) Error![]u8
            cleanup self.freeFn(self, _),

// std.os.File
pub fn openRead(allocator: &mem.Allocator, path: []const u8) OpenError!File
    cleanup _.close()

Having functions which allocate a resource mention their cleanup functions will make generated documentation more consistent and helpful.

nektro commented 2 years ago

because your second point is incorrect

amfogor commented 2 years ago

because your second point is incorrect

It will always come down to "it works only for certain kind of cases" or "too complicated"

nektro commented 2 years ago

*hasn't* been solved in a nice manner and *can't* be solved in a nice manner are two very different things. this issue is still very important and want to be solved, that's why its still open. but there's other more pressing issues atm so for now it sits and waits

dantecatalfamo commented 2 years ago

A feature like this would be huge, forgetting to free/deinit memory is easily the number one problem I have when writing zig

matu3ba commented 2 years ago

Sorry for wall of text. Just my 2 cents on lifetime semantics.

There are two ways to use lifetime annotations:

  1. If the ownership semantics are only interpreted conditionally on the usage side of functions, then the footgun remains as the user may have forgot to use the correct keyword (opt-in).
  2. If the ownership semantics are unconditonally applied (they can only work partially on missing abstraction annotations), this would require additional memorization/complex rules for the programmers to track or static analysis to check for potential holes (in Rust its the unsafe keyword). (opt-out)

Since borrow-checking is slow and requires to annotate the lifetimes as tree derivation, an optimal solution would use the official lsp and the lsp would have a background task or something for that.

The more problematic thing is that showing lifetime information in an intuitive way is still open research: https://github.com/rustviz/rustviz https://www.pm.inf.ethz.ch/education/student-projects/rust-lifetime-visualisation-bsc.html https://blog.adamant-lang.org/2019/rust-lifetime-visualization-ideas/ and none of those contain pointer/reference lifetime information. Intuitively speaking, one needs to visualize lifetime tree information along function traces per variable.

Lifetimes could of course also be graph-like, but I can speak from experience that graph annotation is cumbersome at best and can not be automatically deduced (unless structures trivial or small). The best that I can think of wrt graph annotation/deduction is to have accurate source level tracking to track and infer how in the editor program control flow changes have been changed (3 level of indirection and there is no editor with accurate source level tracking of changes, lol). However, this neither provides a user interface for fixing stuff, nor does it capture programmers intent about the lifetimes from the variables representing certain graph lifetimes (ie order of certain operation how stuff is cleaned up).

phillvancejr commented 2 years ago

As a reference Odin also has this feature using function attributes. The deferred function is called at the end of the scope

@(deferred_out=destroy_window)
create_window :: proc() -> ^Window {
}

destroy_window :: proc(w: ^Window) {
    free(w)
}

There are 4 defer attributes

maybe zig can do something similar

@deferInOut(destroy_window)
fn create_window(allocator: *std.mem.Allocator) *Window {
}

fn destroy_window(allocator: *std.mem.Allocator, w: *Window) {
    allocator.free(w)
}

maybe have multiple attributes

here anytype I assume would be a comptime function pointer. There could also be an errdefer version. Or maybe a single builtin that takes the function pointer and an enum for the mode.

fn destroy_window(allocator: std.mem.Allocator, w: Window) { allocator.free(w) }


It might work well with structs
```zig
const Resource = struct {
    const Self = @This();
    allocator: *std.mem.Allocator,
    resource: []u8,

    @clean(Self.deinit, .none)
    fn init(allocator: *std.mem.Allocator) !Resource {
        var bytes = try allocator.alloc(u8, 1000);
        return Resource{
            .allocator = allocator,
            .resource = bytes,
        };
    }
    fn deinit(self: Self) void {
        self.allocator.free(self.resource)
    }
}
zzyxyzz commented 2 years ago

The big problem here is that a cleanup obligation represents a context, and attaching this context to a piece of data (the return value) is a dangerous idea. Bad things will happen when this piece of data is passed around or duplicated:

var file = clean try openFile("name"); // ok when written like this

// But you could also do this:
const x: !#File = openFile("name");
const y: #File = try x; // strip error
var file2: File = clean y; // strip obligation / defer cleanup
// more code
// ...
var file3: File = clean y; // double free

And trying to add safeguards against things like this leads straight down the rabbit hole of RAII / unique pointers / move semantics / borrow checking / heuristic attempts to detect forgotten cleanups / etc.

The second problem is that many functions need to make their cleanup actions conditional. Anything that returns an error union or option will usually only require cleanup in the case of success. But this makes it very tempting to attach the obligation to the return value after all (in fact, the original proposal has several instances of the clean try foo() pattern).

Finding a syntax that triggers an obligation per call rather than per value, and gracefully deals with optional cleanups, and does not introduce hidden control flow, and makes it reasonably clear what happens in what order, is an over-constrained problem, IMHO.

pjz commented 2 years ago

Better/more @Frame introspection/definition might get us a context-alike. Consider:

fn create_window(allocator: *std.mem.Allocator) void { 
    var w: *Window  = ...;
    suspend {}
    allocator.free(w)
}

// allocate the window
var window_context = async create_window();
// get a nicer alias to the window
var w: *Window = window_context.w;
// do whatever with w
...
// dealloc the window
_ = await window_context;

This treats the @Frame like a simple struct for illustrative purposes, though it could of course be more complicated. And this is just to point out that we almost have the ability to implement an equivalent to python contexts currently. I think this rabbit hole is worth exploring as the solution-space seems to have a lot of overlap with defer/RAII.

zzyxyzz commented 2 years ago

@pjz I don't understand what is gained by putting state into an async @Frame in your example. In the end you still have to call await exactly once per suspend, and the compiler has no way to verify this in general (unless it gained some new superpowers recently?).

BTW, if #6965 is accepted, then one of the main applications would be proper context-wrapping macros, so we could implement a python-like

with_window(allocator, "name", |w| {
   // do something with w
});
// closing the window is handled by the macro

without bringing async into this.

pjz commented 2 years ago

@zzyxyzz A few points: h

I hadn't read #6965 before (thanks for that, BTW), but I find that while I applaud the goal, the means end up feeling bit too abstract. What if, instead, we could just pass around stack frames or pointers to stack frames directly? I didn't notice that the above code would become:

// allocate the window
var window_context = async create_window();
// dealloc the window
defer { _ = await window_context; };
// get a nicer alias to the window
var w: *Window = window_context.w;
// do whatever with w
...

And while, sure, I'd love some syntactic sugar to make the async/defer await bits nicer, the zen of zig is not in favor of that. Maybe in the interest of brevity (ala try) it could be shortened to:

var window_context = async create_window() deferred;

with deferred meaning there's an implicit defer await of the results.

SmolPatches commented 1 year ago

I'm new to Zig and I really like some of the things that people mentioned about simplicity within this language. While the explicit mem management that Zig offers is great, I for one would welcome this feature.

Heres my take, Rust like borrow checking is awesome but it comes with its own overhead as people here have mentioned. So making it a language standard would probably upset many, especially those using zig on constrained devices. So what if within Zig there was an abstract that could be attached to blocks where they would be attached dispatched and executed within the frames of this borrow checker? The abstraction would oversee and 'own' all the data defined within its scope. When data is freed, the owning frame would prohibit re-usage whether it's by throwing error following attempted accesses or some other way of handling the mistake. This way anyone could opt in to its use and it'd be compatible with traditional Zig execution. Since the runtime is only added to those blocks if a value is returned to regular execution it will lose the data protections that came with the extra runtime within the block.

nektro commented 1 year ago

since inline fns are inlined at compile-time, perhaps this could be solved with an inlinedefer mechanism (name pending)

pub inline fn allocAndClean(self: Allocator, comptime T: type, n: usize) Error![]T {
    defer foo(); // this would run at the end of 'allocAndClean'
    inlinedefer bar(); // this would run at the end of 'allocAndClean' caller's scope

    // eg:
    var buf = self.alloc(T, n);
    inlinedefer self.alloc.free(buf);
    return buf;
}

edit: this wouldnt solve the general case though when there's another function that calls .alloc() etc in the same way that clean try would, nvm

jamsilva commented 1 year ago

Here's a possibility for implementing this with just 2 new keywords.

To start with, I think that the declaration syntax described in the first comment works fine:

fn create(self: &Allocator, comptime T: type) !&T
    cleanup self.destroy(_)
{...}

When used in a defer / errdefer context, cleanup becomes the function defined as the cleanup function during the declaration so that:

defer obj.cleanup;
errdefer obj.cleanup;

are exactly the same as:

defer the_allocator.destroy(obj);
errdefer the_allocator.destroy(obj);

If the compiler has the logic to desugar this and then does the check for the defer statement on the desugared version, developers are able to choose if they want to use this or continue manually calling the respective cleanup functions (since some people seemed opposed to making the exact function implicit). More importantly existing code that does this would continue to compile correctly.

Of course, to check this, there should be an escape hatch for cases where you don't want to clean up and that's why the second keyword would be needed. Something like nocleanup perhaps (used by itself in the next statement, as opposed to noclean in the same statement as in the original proposal).

PeyTy commented 1 year ago

Hi! Want to share my idea of stack pointers https://gist.github.com/PeyTy/7983dd85026cc061980a6a966bc1afc2 Somewhat related to the issue (I think there was an old issue on stack pointers detection but cannot find it). And yes, I know, that's why not opening new issue

ghost commented 1 year ago

As a new zig learner having some construct that forces you to handle resource cleanup would be really great from both a memory safety and developer experience perspective.

tauoverpi commented 1 year ago

I'm wondering if attaching "uses" to variables/parameters would solve this somewhat. It would involve having something along the lines of enum { infinite, linear, dead } in addition to the is_comptime flag where infinite is that we have today with any number of copies, linear allows for just one of the value which can be moved (unlike @pin) but that would destroy the original, and dead meaning the value has no uses left thus cannot be used anymore. Function would then say how they affect the resource (comsume, preserve, produce).

For example, alloc would be fn(Allocator, type, usize) Error!linear []T where linear means exactly one use; free would be fn(Allocator, consume anytype) void where consume requires that the value is linear and that it's not been consumed before (consume 1->0). For files you'd have the same but define write as fn(preserve File, bytes: []const u8) !usize where preserve means that the resource isn't consumed by the operation. linear is viral just like @pin.

There should be no implicit cast from linear to infinite, instead a builtin @linearCast provides the ability to go back to infinite from linear when desired (e.g in the case backend code can't be proven linear but use of handles given to the user should have linear lifetimes).

defer need not be special nor does errdefer as a linear value cannot be returned via error return and linear values must be either be returned (or break) or consumed by the end of scope.

References to linear things pretty much require borrowing (e.g &T and &const T) thus there needs to be a sense of ownership where stack locals are owned by the current block { ... } until returned to transfer ownership.

TL;DR - introduce new modifiers for parameters / types along with a new type linear such that:

PeyTy commented 1 year ago

@tauoverpi what the key differences from Rust's borrow checker? To be honest, I don't see any notable differences, except the defaulting to non-linear. And preserve feels like static lifetime from Rust too.

What you're proposing here would essentially make Zig a Rust 2.0 (😉)

matu3ba commented 1 year ago

References to linear things pretty much require borrowing (e.g &T and &const T) thus there needs to be a sense of ownership where stack locals are owned by the current block { ... } until returned to transfer ownership.

Declaring ownership is not the main problem. Enforcing via solving the semantics however is, because otherwise annotations only describe intended semantics.

The main problem of the Rust borrow checker is summarized here https://github.com/ziglang/zig/issues/2301#issuecomment-815062413, so it is a committed local optimum.

More general formal systems would boil down to more expressive annotations taking into account how the formal system is solved, which are not possible to bake into the syntax as it would mean to enclose all possible formal systems to prove all possible properties.

Something like https://github.com/ziglang/zig/issues/14656 would be necessary to cover these use cases.

InKryption commented 1 year ago

Bringing up an idea inspired by a discord conversation from a while ago:

fn make(ally: Allocator, n: usize) Allocator.Error!unmake#[]f16 {
    const result = try ally.alloc(f16, n);
    @memset(result, 0.5);
    return result; // coerces implicitly to the obligation
}
fn unmake(
    ret: []const f16,
    allocator: std.mem.Allocator,
) void {
    allocator.free(ret);
}

test {
    const ally = std.testing.allocator;
    // `defer(...)` is required to unwrap the result, and simultaneously does the
    // same thing as `defer unmake(slice1, ally);` all the required state except
    // for the result value itself must be passed as arguments - no closures or
    // contexts or whatever else
    // `errdefer(...)` may also be used for the behavior you'd expect
    const slice1 = try make(ally) defer(ally);

    // escape hatch to request to manage the resource manually, eliding automatic cleanup.
    // syntax doesn't matter, could be anything like `foo() noclean`.
    const slice2 = try make(ally) @noclean();
    unmake(slice2, ally);

    // this should be a compile error. "clean up unions" or whatever you want to call them
    // should never be stored. They should be treated like half opaque types,
    // in that using one as the type of a field or variable should be a compile error,
    // but should be allowed as the type of an expression (e.g. result of a function call,
    // in-place coercion with `@as`, etc). This helps to avoid double frees.
    const slice3 = try make(ally);
}

The pieces that comprise this are all very similar to other concepts brought up in this issue, but I feel this addresses a good bunch of the counter-points of those previous ideas.

And to just quickly address the "no hidden control flow" argument that may be brought up against this, I would argue that there isn't any control flow that's truly hidden here, as the caller is explicitly forced to either opt-in or opt-out of the cleanup behavior, which effectively makes it as transparent as just deciding whether or not to call a function - it just happens that said function is being "recommended" to clean it up.

Edit: it occurs to me that, with the syntax presented here, pairings like ArrayList's init and deinit will work fine with the first argument rule, but wouldn't work well with Allocator's alloc and free, since the first argument to free is the Allocator. Not 100% sure how to rectify this. One idea might be to mark in the resource-freeing function which parameter receives the resource, something like

fn alloc(self: Allocator, comptime T: type, n: usize) Error!free#[]T {
    // -- snip --
}
fn free(self: Allocator, #ptr: anytype) void {
    // -- snip --
}

and when it comes to how the arguments end up passed, it should behave as if the resource argument is put between the relevant arguments to defer(...), e.g.

fn make(ally: Allocator, n: usize, align_log2: Allocator.Log2Align) !unmake#[]align(1) i16 {
    // -- snip --
}
fn unmake(ally: Allocator, #slice: []const i16, align_log2: Allocator.Log2Align) void {
    // -- snip --
}

test {
    const ally = std.testing.allocator;
    const slice1 = try make(ally, 15, 0) defer(ally, 0); // the value is passed as if in between `ally` and `0`.
    const slice2 = try make(ally, 31, 4) errdefer(ally, 4); // ditto
    const slice3 = try make(ally, 63, 2) @noclean();
    defer unmake(ally, slice3, 2); // can still call as normal function
}

Edit 2: another idea would be to make the syntax look like this:

const slice = try make(ally, 127, 0) defer(ally, _, 0);

Which may look a bit odd, but keeps things simple, and is tangentially related to inference, with underscore being the same symbol that's used to infer array literal length.

InKryption commented 1 year ago

OK, a slightly revised version of the proposed ideas after some amount of deliberation:

const std = @import("std");

// #unmake is an annotation of the whole function, not part of a type (syntax up for debate,
// maybe use `@cleanup(unmake)`?)
fn make(ally: Allocator, size: usize, align_log2: Allocator.Log2Align) E![]T #unmake {
    // -- snip --
}
fn unmake(ally: Allocator, slice: []T, align_log2: Allocator.Log2Align) void {
    // -- snip --
}

test {
    const ally = std.testing.allocator;

    // compiles fine, is the same as writing
    // ```
    // const slice1 = try make(ally, 31, 0);
    // defer unmake(ally, slice1, 0);
    // ```
    const slice1 = try make(ally, 31, 0) defer(ally, _, 0);

    // compiles fine, is the same as writing
    // ```
    // const slice2 = try make(ally, 127, 8);
    // errdefer unmake(ally, slice2, 8);
    // ```
    // caller takes responsibility for freeing the resource in
    // the absence of error
    const slice2 = try make(ally, 127, 8) errdefer(ally, _, 8);

    // this is the fine, it is the same as choosing to omit the
    // `defer unmake(ally, slice4, 4);` code - caller takes responsibility
    // for freeing the resource
    //
    // Syntax up for debate, maybe use a builtin like `@noclean()`, or maybe
    // an operator using the # symbol, or whatever else. Could also be prefix
    // instead of postfix to distinguish from normal defer auto-cleaning.
    const slice3 = try make(ally, 63, 4) noclean;
    defer unmake(ally, slice3, 4);

    // // this would be a compile error, it's the same as writing
    // // ```
    // // const slice4 = make(ally, 31, 0);
    // // defer unmake(ally, slice4, 0);
    // // ```
    // // basically, `E![]T` can't coerce to `[]T`
    // const slice4 = make(ally, 15, 2) defer(ally, _, 2);

    // this would be a compile error, because the expression resulting from
    // a function annotated with a clean up function is not annotated with
    // either a `defer`, `errdefer` or `noclean`.
    const slice5 = try make(ally, 7, 1);

    unmake(ally, slice2, 8); // don't forget to clean up after no errors occurred
}

To put some of this into more concrete terms:

An annotation on a function of the form #name indicates that the result location of the expression of a call to the function is a resource that must be cleaned up. This does not type check against anything yet.

When this function is called, the caller may transform the resulting expression in whatever arbitrary way is desired; we can say that the expression itself is flagged by the compiler as being the result of a call to a function that's annotated with a clean up procedure. If the expression is assigned to a variable, parameter, etc, without getting rid of this "flag" or "metadata" or however you'd like to call it, the compiler issues an error; the only way to get rid of this flagging is by annotating the expression with either defer(...), errdefer(...), or noclean.

In essence, this is purely a formalization of the extremely common-place pattern of paired init() and deinit(), turning code like

var list = try std.ArrayListUnmanaged(u8).initCapacity(allocator, 32);
defer list.deinit(allocator);
// -- snip --
return try list.toOwnedSlice();

into

var list = try std.ArrayListUnmanaged(u8).initCapacity(allocator, 32)
    defer(_, allocator);
// -- snip --
return try list.toOwnedSlice() noclean;
PeyTy commented 1 year ago

@InKryption consider this, what comes at ????

fn make_at_upper_scope(ally: Allocator, size: usize, align_log2: Allocator.Log2Align) E![]T #??? {
    return try make(ally, 31, 0) ???;
}
fn make(ally: Allocator, size: usize, align_log2: Allocator.Log2Align) E![]T #unmake {
    // -- snip --
}
fn unmake(ally: Allocator, slice: []T, align_log2: Allocator.Log2Align) void {
    // -- snip --
}

test {
    const ally = std.testing.allocator;
    const slice1 = try make_at_upper_scope(ally, 31, 0) defer(ally, _, 0) /* or what? */;
}
zzyxyzz commented 1 year ago

@PeyTy You can always try to take manual control in such cases:

fn make_at_upper_scope(ally: Allocator, size: usize, align_log2: Allocator.Log2Align) E![]T #unmake {
    return try make(ally, 31, 0) noclean;
}

... which is of course unsatisfactory, because you have to dig out the literal function doing the cleanup, ~which might just as well be private to another module~. It would be nicer if we could somehow refer to the destructor anonymously, but I don't see how this would be possible syntactically, since the destructor is attached to a call within the function body, but has to be referred to outside of it, in the signature. So this probably calls for a separate feature to forward an obligation:

fn make_at_upper_scope(ally: Allocator, size: usize, align_log2: Allocator.Log2Align) E![]T # {
    return try make(ally, 31, 0) moveclean;
}

Just thinking out loud. Maybe there's a simpler solution.

InKryption commented 1 year ago

Obligations definitely should not be implicit, as that essentially would start to fall under hidden control flow. Perhaps it could be part of type info, and be retrieved by a helper function as #obligation(make).

PeyTy commented 1 year ago

Speaking of hidden control flow. Most ideas around Zig memory management pretend that memory lifetime is simply linear and stack-like. I don't believe defer and automatic substitution of defer(ally...) solves everything, but I agree that Zig follows explicit rules.

Thus, what about, instead of trying to automate defer cleanup() calls, compiler would require manual action?

A quick made up example:

fn make(allocator ..., ...) #reminder {
  ...
}

test {
  var gpa = ...;
  var demo = make(gpa, ...);

  // at the end of scope compiler would say something like "demo leaves the scope, action required"
  gpa.free(@actionTaken(demo)); // @actionTaken simply removes the flag #reminder from this variable

  {
    // would work for ref counting too
    // Doc comment saying that "you must call rc_decrement at scope leaving"
    fn alloc_rc(allocator, ...) T #reminder_at_every_scope_leaving {
       return ...
    }

    fn rc_decrement(allocator, object #action_taken, ...) {
      if (object.rc-- == 0) allocator.free(object);
    }

    var refcounted = alloc_rc(gpa, ...);

    // you must do something according to the documentation of the `alloc_rc` function
    // in Zig it is assumed that you follow the docs anyway
    rc_decrement(refcounted); // #action_taken implicitly means @actionTaken(refcounted)
  }
}

You may read more about similar idea in Vale: https://verdagon.dev/blog/higher-raii-7drl

That part:

With Higher RAII, we can ensure that we eventually call any function...

This way you don't have any automatic substitution, but compiler helps you to not forget to do something. It makes the code literally the same as it is right now in any Zig codebase, with all those GPA calls etc, but with compiler #reminders. Seems like a nice compromise?

whatisaphone commented 1 year ago

It might be too limiting to tie the return value directly to one particular cleanup function, because some resources can be cleaned up in more than one way. For example:

InKryption commented 1 year ago

Well, in the thread case, those are cases wherein it wouldn't make sense to call either of those functions to "release" the resource. If we look at something like C++ or Rust, which have destructors for "resources", they also do not join nor detach in the destructors of their standard thread types. These annotations would be focused on the more common use case of resources which are tied to mostly linear scopes.

whatisaphone commented 1 year ago

(It's a real minor point, but Rust does detach when you drop a JoinHandle)

If more complex resources are out of scope, fair enough, just thought it was worth mentioning

andrewrk commented 1 year ago

I think having this issue open is confusing/misleading people. I'm not seriously considering this proposal and it's very likely that nothing will come of it. The only reason it was open was due to being possibly related to async cancellation. That is still an unsolved use case, but even so I'm confident the solution is not annotating resource-allocating functions with what is essentially destructors.