Closed andrewrk closed 1 year ago
Er... wasn't that something I proposed a few months ago in the discussion on resources and the use of "#" etc? I cannot seem to find the issue :-(
@kyle-github I think it was #494
@Ilariel, Ah, right. Thanks! I looked back, but not that far.
I would like to see something like this proposal combined with some of the ideas in #494. I think (not carefully thought through!) that it might be possible to come close to Rust's ownership/borrow checker in power. Perhaps it is too easy to allow escapes for it to be workable, but even 90% coverage would catch a huge number of cases. Determining lifetime is not that simple, however.
@andrewrk I was imagining something like this though with different keywords, but these are good keywords, too. Perhaps also allow a default standard (like your common self.deinit) so all you have to do is say clean(up) on the function header if you conform?
@kyle-github I also have some ideas about the "90% coverage" for borrow checking kind of thing, too, but I'd rather just see 1.0 first.
I know this is bike shedding, but it reads kind of weird:
clean try get_something();
Sounds like "cleanly attempt to get_something()
".
Maybe something more like auto_close
reads more natural.
What exactly does errclean do? It sounds like "if no function call here annotated with try
returns an error, then don't auto close this resource at the end of this function", which sounds like you're implicitly taking owner ship of the resource without explicitly saying so.
Maybe the default thing should be: if the function "throws" and the resource has not been assigned to any object that lives outside this scope, it's automatically closed, and there's no need to add any annotation for that.
If you want the resource to not autoclose (even on errors), that sounds like it needs a special syntax, for example: own
or take
.
Another idea is to mention the cleaning strategy after:
try get_something() auto_clean;
And if desired maybe it can be customized:
try get_something() auto_clean(cleanup_function)
And if no cleaning is needed:
try get_something() without_clean;
But at this point it feels like the language is getting too complicated.
@hasenj Interesting. My keyword plan for autoclean was own
, which you intuitively feel would mean the opposite. And my noclean
was disown
. So many different ways to take implications (and I guess that's why bikeshedding).
Personally, I still think the keywords clean
, noclean
, and so on from the proposal are clear. And I'm much happier with the prefix syntax, too. The object of the keyword is clearer to me. I read clean try
as "autoclean" (or even "be clean with") "the thing I tried and succeeded on" and errclean
as "autoclean this on error" and I'm quite happy with the "errdefer" symmetry.
On the other hand, with this proposal in place, you could possibly drop the ad hoc defer
and friends entirely.
This proposal is to
make it harder to forget to clean up a resource make it easier to clean up resources Strategy:
Functions which allocate resources are annotated with the corresponding cleanup function.
that is pretty much RAII so why reinvent the wheel?
and if you basically add RAII to the language you probably need copy vs move semantics as well
So the main critique of RAII is that it is type based. That means you need to define a type for each lock/unlock, open/close, allocate/free, etc. Then again, idk a better alternative or how this is any different.
IMO you want ctor/ dtor and with that RAII semantics sometimes and also want defer keyword other times.
Maybe just do it like rust does RAII, which is easier than cpp.
@monouser7dig
RAII requires constructors, destructors, move semantics overridable copy semantics and wrapping everything in wrapper types (unique_ptr
).
This solution does not require any of these features, because either:
clean
on scope exit. Aka you own this resource and is not gonna pass it uperrclean
on scope exit. Aka, you own this resource when an error occurs.noclean
, and is expected to pass the ownership.Also, Rust RAII is easy, because Rust keeps track of ownership for you. Zig does not, so it would have to be as involved as C++.
Well that is just a stripped down version of RAII
so the first two cases would be covered by the traditional RAII approach and the proposed syntax is just another syntax for doing it as far as I can see.
I don't see why you would not just call it what it is.
@monouser7dig Well, if this is just about the name, then sure, we can call it RAII. One should just be careful that people don't confuse it with ctor/dtor, move, copy, implicit dtor calls, wrapper types and all that.
I argue what andrew is proposing already is ctor dtor wrapper type and soon also needs to be copy and move. That is just how it is/ what you need. All those functions return values and those are the wrapper types. The „make**“ Funktion is the ctor of that type and the deferred / clean function is the dtor.
Now as soon as you copy such a type that was returned from „make**“ you need copy and move semantics as welll otherwise this example won’t hold for anything but trivial code.
....or rename it to noclean which may cover part of the usecases but it’s still reinventing the wheel as far as I can tell.
Concerning rust: What you say is true but does not mean zig could not do the same or a variation of it. Zig does not control you memory safety either so it could just not control your moved from values and be fine, just different safety level than rust.
I very much agree with @monouser7dig. As long as Zig aims itself to be an applicable alternative for C, I feel that this feature is too high level to be of good taste. It just feels like unneccesary sugaring to me. The way Zig does resource aquisition/destruction now is nice and elegant, and trying to imitate Rust and C++ here feels like a stab in the back to C-style simplicity of Zig.
make it harder to forget to clean up a resource
Is this actually a problem for anyone? This is a valid concern, but I feel that this problem should only be addressed if it is a real-life problem, not just a hypothetical.
make it easier to clean up resources
...therefore locking programmers into a single form of deallocation. There are many ways to have a "constructor", and depending on the problem, there may be many ways to have a "destructor" too. It is not the place of Zig (or any sane language) to force one form of resource destruction on the programmer.
Zig as a language tries very hard to not hide allocations behind a programmer's back. It must also not hide deallocations either.
Too complicated
Not sure that is the correct final answer to the problem
Language design might be complicated if it makes the programmers life less complicated in the end.
But maybe it’s best to think about it more and start a new proposal in the future.
I think It would be especially worth to investigate https://github.com/ziglang/zig/issues/782#issuecomment-404502930 this issue further because https://github.com/ziglang/zig/issues/782#issuecomment-404502081
Here's some real actual C code that wants to document ownership semantics for an array of strings returned by a user-supplied function: https://github.com/thejoshwolfe/consoline/blob/2c5e773442f89860f9ee82e13978b5ef3972ca99/consoline.h#L29
if this api were rewritten in zig, would it be possible to encode the desired ownership semantics with this proposal?
@thejoshwolfe presumably it would by providing a default cleanup wherever caller deallocation is necessary. I guess the assumption is that no caller should free anything provided by a function unless it has a specified clean function.
So turns out Jai got ctors and now wants to rip them out because they're not happy with it, I've not looked into the details, just found it interesting enough to add it in here.
Re-opening in light of #2377. Functions which provide a way for the compiler to automatically generate cleanup will make cancel
work for non-async functions, without having to generate those functions specially. It also allows defers of async functions to run before tail resuming the awaiter, which is slightly more efficient. So now we have these reasons for investigating this feature:
I do think we need a better syntax/semantics proposal for how to annotate functions that allocate resources. There are a lot of issues with the syntax proposed above.
I don't know how to make this work, and I'm not convinced it's a path that will be fruitful.
"Ownership You Can Count On" won't work for Zig, because that still requires reference counting everything. Unless Andrew wants Zig to track those in debug builds only ...
As for automating defer x.deinit()
by convention, I don't see at all why it should be so hard, but I don't want to push it anymore if Andrew's done with the topic. (Working on my own language again these days, anyway. Though I've never gotten far on such efforts.)
Re-opening in light of https://github.com/ziglang/zig/issues/3164#issuecomment-527504887. This would be required in order to implement useful cancel
semantics into async functions.
Doesn't this imply hidden function calls much like operator overloading? I find the explicit defer
to be more clear at the callsite.
In my own project I noticed I had some initialization functions which create multiple resources and don't actually clean up properly if one of them fails. It's so easy to just do
try ...
try ...
try ...
possibly with some code in between.
I also found a few cases in Zig std.
One is here:
https://github.com/ziglang/zig/blob/eb4d313dbc406b37f6bfdd98988c88c3b8ed542e/lib/std/build.zig#L120-L125
If the second try fails the BufMap
is never cleaned up.
Another is here:
https://github.com/ziglang/zig/blob/eb4d313dbc406b37f6bfdd98988c88c3b8ed542e/lib/std/debug.zig#L480-L488
mod.symbols
and mod.subsect_info
are never cleaned up if an error occurs.
I haven't looked further. It's a bit hard to search for. And that's my main point. It's hard to find these bugs. It looks like the error is handled, so it's all fine, right? But actually no. After acquiring a resource you have to clean up if you don't intent to hold on to it for longer.
Now maybe in most cases you don't actually care too much, because if there's an error you don't really want to handle it, you just want to give up. Does that mean it shouldn't be try acquire_some_resource();
, but rather acquire_some_resource() catch unreachable;
? Or some smilar way to just exit? Or maybe Zig can have some syntax which requires a clean-up block to be written by default? Such as a try
and defer
in one.
I'm not really sure, but I do wanted to say that I think that currently it's quite easy to just try
everything and forget about cleaning up.
@BarabasGitHub that's what errdefer
is for
@frmdstryr yes I know about errdefer
, but my point is that especially errdefer
is very easy to forget and hard to test in general. Harder than things you need defer
for. And I suggest that something which isn't totally separate from try/catching errors could help people not to forget about cleaning up (writing the errdefer
part).
Why not just a simple extension to defer, and get on with it? Use 'defer to say: defer the execution to the next scope. And then you can put the 'defer inside the function that allocates.
One can extend this to any number of scopes ''defer to jump 2 scopes and so on. This will be an easy extension to the language and will probably cover most use cases.
How about adding annotation for function as a „resource making” and force a compiler to use defer
errdefer
or some other keyword like safe
after a call to this function?
const err_pipe = try makePipe() safe;
Would just ignore resource aquisition
const err_pipe = try makePipe();
Would look for either defer
or errdefer
called on err_pipe
Resource is still user managed, as the function just says that it needs cleanup but doesn’t enforce one way to do it on the user, while still providing safety after such calls (after all the user will be forced to do something)
It doesn’t address „making resource management” easier and less repeatable, but I’m not sure if that’s what we really need. Zig as of now is trying to be readable at first glance, RAII way would only add another layer user would need to check, not to mention it goes close with OOP
What if this was a feature of the returned value itself (like an error union), rather than described as a part of the 'calling convention' of the function? One of the choices that Zig made (that I think was very good) with errors was making errors values rather than a part of function signatures, as they are in languages with exceptions like Java/C++. What if we tried that for cleanup-obligations?
Something like: Val#Obligation
is the type of an obligation tuple. It holds a value, and something which must be done (called) eventually (i.e., it represents an obligation [that a resource is cleaned up]). Like errors have special syntax like try
and catch
and errdefer
, obligations can have special syntax:
nocleanup obligation_tuple
gets the value, discarding the obligation.
cleanup obligation_tuple
is the same as defer obligation_tuple.obligation.fulfill(obligation_tuple.value); obligation_tuple.value
This doesn't fix the verbosity of calls like cleanup try allocate()
, but I think it's simpler than adding arbitrary expressions to the signature of functions. Checking that resources aren't missed is simply handled by not allowing raw access to the .value
except by nocleanup
, and the requirement that non-void values aren't discarded
This doesn't fix the verbosity of calls like
cleanup try allocate()
Why does it need to be a one-liner ? Because the Obligation
is now part of the type,
and you need to downcast the value before being used.
The two line version (that we don't want to do):
var x_with_obligation : Value#Obligation = try allocate();
var x : Value = cleanup x_with_obligation;
I think showing too much compile time information in the type of objects which is supposed to represent a memory layout is not a very good idea (too close to C++). The Obligation has no consequence on how you can use the value, so I'm not convinced that the type is the right place to store it. And AFAIU it will have a runtime cost unless the compiler inline the function.
I'm suggesting that Obligation
should be along side the type (this may sound crazy, but bear with me), as a new compile time metadata.
Then the compilers has two orthogonal job:
Then you can write:
fn init(n: u32, allocator: Allocator) HashMap#deinit {
var map = ...;
return @obligation(map, Hashmap.deinit);
}
var x: Value = try init();
defer x.deinit();
// alternatively: `cleanup x` or `defer cleanup x`;
Most of time the Obligation isn't visible in the caller code, only in the callee code and signature.
If we want to make the "Obligation" visible, we can force the use of a cleanup
keyword,
but otherwise we can keep idiomatic Zig code with init/deinit.
If a user forget to call the cleanup method, it will receive a compile time error, which can have a dedicated error message.
Pros:
cleanup
keyword, the caller code stays similarCons:
@typeInfo
, ...) I have been following Zig from afar, and unfortunately did not have the time to really try it however I'd like to add a bit of input to that discussion, hopefully this is not too much off-topic:
I think as soon as you decide to have implicit or checked (such as with @gwenzek obligations) cleanups, you will essentially tie behaviour to the lifetime of objects. In effect you will ensure the cleanup logic is done when the object dies (in the implicit defer case) or that it has to explicitely be done before that happens (in the obligation case).
"when the object dies" here means when the scope that created this object exits. If you admit that this cleanup is tied to the object lifetime, then another question naturally arises: How about about objects whose lifetime is not neatly enclosed by a scope, what if we have an ArrayList
of File
, can we somehow fulfill that obligation to close the file ?
I am not sure this "obligation" can be tracked by the compiler as the ArrayList
might be returned, passed around, copied... Only through some complex set of rules enforced at compile time similar to rust's borrow checker would be you be able to guaranteed that.
What can be done without additional constraints on the language expressive power is to enable ArrayList
to perform the cleanup on its contained values, this seems only possible without runtime overhead if the cleanup logic is a property of the type, not a property of the function that created the object (as there can be many of those).
This cleanup logic associated to a type is commonly called a destructor, and I believe it is the cleanest solution to resources management. Please note that destructors are not necessarily called implicitely, Zig could still require some opt-in syntax at scope level to make an object destructor automatically called at scope exit. Having destructors (which could be an arbitrary method with some well-defined annotation, easily indentifiable through reflection) means cleanups can be nested, calling deinit
on an ArrayList
of ArrayList
of File
would correctly cleanup all the files and all the allocated memory.
Hope this helps, keep up the good work with Zig, it is definitely one of the most interesting new languages in my view.
As a newcomer to this language I already made the mistake @BarabasGitHub pointed out with not adding errdefer between a pair of trys, and while (I thought) I'd thought about the problems of ownership and releasing resources I'd only done so for the happy path; as soon as I read his note I went back to my code and fixed it. I see this as being a very easy mistake to make. I also do not like just documenting ownership responsibility in a comment. Of all the proposals I think I like @CurtisFenner the best; reflect the ownership obligation in the type system paired with cleanup/errcleanup/nocleanup keywords. This avoids ctor/dtor & move semantics while still providing some significant additional safety benefit and I think pairs nicely with the existing error unions functionality and feel of zig.
The allocate/deallocate problem seems to me to be pretty similar to the async/await or suspend/resume problem... the docs even say "In the same way that each allocation should have a corresponding free, Each suspend should have a corresponding resume." So maybe the answers should have the same 'shape'? Maybe instead of async/await and suspend/resume have like create/destroy and init/deinit ? (exact keywords not important to me) This would also re-use a mental pattern instead of having to invent another one. Though I'm not sure where errdefer fits in...
Syntax-wise, the introduction of so many new keywords could be avoided by reusing defer
&co:
pub fn openRead(allocator: &mem.Allocator, path: []const u8) OpenError!#File
defer #.close()
{
// function body
}
The #
in the type signature indicates an obligation, and also shows which part of the return value it is attached to, which allows referring to it unambiguously in the following defer
expression.
At the call site, the presence of a deferred action has to be explicitly acknowledged so that we don't have hidden control flow:
var in_file = defer try os.File.openRead(allocator, source_path);
errdefer
and nodefer
can be used instead, as appropriate. nodefer
is the only new keyword and transfers responsibility for the cleanup to the programmer.
Here's what Lua 5.4 does with a variable declaration like local foo<close> = Constructor()
:
A to-be-closed variable behaves like a constant local variable, except that its value is closed whenever the variable goes out of scope, including normal block termination, exiting its block by break/goto/return, or exiting by an error.
Here, to close a value means to call its
__close
metamethod. When calling the metamethod, the value itself is passed as the first argument and the error object that caused the exit (if any) is passed as a second argument; if there was no error, the second argument is nil.The value assigned to a to-be-closed variable must have a
__close
metamethod or be a false value. (nil and false are ignored as to-be-closed values.)
I'm not enough of a Zig person to say what the moral analog of "having a __close
metamethod" would be. In a duck-typed language, it'd just be "has a close()
method." In Java, there'd be a Closeable
interface.
There is a rule used by the C# code-analysis tool for the IDisposable
pattern,
https://docs.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1063
It can be applied in Zig, but it would require annotating the entire container (struct, opaque, union, etc) with some kind of deinit
function.
defer
usage if the resource is not returned nor assigned in the scopefn foo() !void {
var in_file = try os.File.openRead(allocator, source_path);
defer in_file.close(); //<--- Enforce that the `deinit` function is called
//Do something
}
errdefer
if the resource is returned or assigned, but something can fail inside the scopefn getFile() !File {
var in_file = try os.File.openRead(allocator, source_path);
errdefer in_file.close(); //<--- Enforce that the `deinit` function is called
try mayFail();
return in_file;
}
fn foo() !void {
var in_file = try getFile(); // <-- returned from another function
defer in_file.close(); //<--- Enforce same rules here
}
deinit
function toopub const MyResource = struct {
in_file: File,
pub fn init() !MyResource {
return MyResource {
.in_file = try os.File.openRead(allocator, source_path) // <-- Assigned
};
}
pub fn deinit(self: @This()) void {
self.in_file.close(); //<-- Enforce the `deinit` function
}
};
fn foo() !void {
var my_resource = try MyResource.init();
defer my_resource.deinit(); //<--- Enforce same rules here too
}
Just one deinit
function per container
It could behave wrong depending on copy/move semantics
@batiati you will already get an error when a variable is unused. And not sure this has anything to do with this issue.
@amfogor, why do you think it was about unused variables?
I'm talking about how the compiler could enforce the use of the deinit
function, inspired on the "Dispose pattern" used in the C# code-analysis tool.
Maybe I wasn't clear enough in the examples, but it's all about the defer
, errdefer
and deinit
usage.
How about the ability to pass an obligation to the function's caller like errors?
fn func(x: i32) void {
std.debug.print("{}\n", .{x});
}
fn funcWithObligation(x: i32, y: i32) #void {
pass func(x+y);
pass func(x-y);
}
fn doFuncWithDefer() void {
defer funcWithObligation(1, 1);
...
return; //deferred obligations are called
}
//obligations are stacked, so the output would be:
//0
//2
This allows cleaner types. For example, I don't need to store the allocator and create a deinit function for Foo.
const Foo = struct {
const Self = @This();
data: []const u8,
pub fn init(allocator: *Allocator) !#Self {
var data = pass try allocator.alloc(u8, 10); //allocation's obligation passed to funciton caller
@memset(data.ptr, 0, data.len);
return Self{
.data = data,
};
}
pub fn default() Self {
const default_data: []u8 = "0123456789";
return Self {
.data = default_data[0..],
};
}
};
fn doSomething(foo: Foo) void {
...
}
fn main() !void {
//nothing been allocated, no obligation, no need to defer
var def = Foo.default();
doSomething(def);
//return type communicates to the programmer that there is an obligation
var allocatedWithObligation = try Foo.init(allocator);
doSomething(allocatedWithObligation); // type mismatch error
// defer or pass to get the value
var allocated = defer try Foo.init(allocator);
doSomething(allocated); //Ok
}
Async example:
var global_download_frame: anyframe = undefined;
fn fetchUrl(allocator: *Allocator, url: []const u8) !#[]u8 {
_ = url; // this is just an example, we don't actually do it!
const result = pass try std.mem.dupe(allocator, u8, "this is the downloaded url contents");
suspend {
global_download_frame = @frame();
}
std.debug.print("fetchUrl returning\n", .{});
return result;
}
var global_file_frame: anyframe = undefined;
fn readFile(allocator: *Allocator, filename: []const u8) !#[]u8 {
_ = filename; // this is just an example, we don't actually do it!
const result = pass try std.mem.dupe(allocator, u8, "this is the file contents");
suspend {
global_file_frame = @frame();
}
std.debug.print("readFile returning\n", .{});
return result;
}
fn asyncAwaitTypicalUsage(allocator: *Allocator) !void {
var download_frame = async fetchUrl(allocator, "https://example.com/");
var file_frame = async readFile(allocator, "something.txt");
const download_text = defer try await download_frame;
const file_text = defer try await file_frame;
expect(std.mem.eql(u8, "expected download text", download_text));
expect(std.mem.eql(u8, "expected file text", file_text));
}
Obligations are run if an error returned.
fn allocButError() !#void {
var x = pass try allocator.alloc(u8, 100);
return error.SomeError; // passed obligation is run
}
The problem with that is they have to work essentially like a state capturing lambda.
I found the lambda proposal. Many of the problems discussed there would seem to apply.
How about the obligation just be a function pointer? Example:
const Foo = struct {
const Self = @This();
allocator: ?*Allocator = null,
data: []const u8,
pub fn init(allocator: *Allocator) !deinit#Self {
var data = try allocator.alloc(u8, 10);
@memset(data.ptr, 0, data.len);
return Self{
.data = data,
.allocator = allocator,
};
}
pub fn deinit(self: Self) void {
self.allocator.?.free(self.data);
}
pub fn default() Self {
const default_data: []u8 = "0123456789";
return Self {
.data = default_data[0..],
};
}
};
fn main() !void {
var noAlloc : Foo = Foo.default(); // No obligation
var foo : #Foo = try Foo.init(); // Obligations can be inferred
//we can...
foo.#(); // Call the obligation manually like any other function
//or defer it...
defer foo.#();
}
So in my experience, my most common bug is "I forgot to do any kind of defer
statement after this function" and not "I called the cleanup function incorrectly". With that in mind, maybe all that's necessary is declaring a function as a resource-creating function, and require that you do some kind of defer/errdefer
to access that value.
EDIT: turns out this was pretty much already proposed in exactly this form in https://github.com/ziglang/zig/issues/494 and rejected, so nvm!
// syntax doesn't really matter, just something to mark the type as a resource
// I'm using "#" here but I don't love it
/// Must call destroy() on returned value
fn create() !#T;
fn destroy(t: T) void;
fn foo() !#void {
// in this world, `defer` is similar to `catch` or `orelse`: it "unwraps" a resource so you can access it
const value1: T = try create()
defer destroy(value1);
// `errdefer` unwraps the value, but requires this function to also return a resource
const value2: T = try create()
errdefer destroy(value2);
// This is ok even though you're not actually cleaning up your resources, similar to `catch {}`
const value3: T = try create() defer {};
// Compile error: this is of type #T (a resource that needs to be cleaned up) and not T
const value5: T = try create();
}
fn ownershipPassingFunction() !#T {
// There's no need to `defer` here for the same reason that there's no need to `try`
return create();
}
// For void functions, it still works similarly to `try`
/// must call deinit()
fn init() #void;
fn deinit() void;
pub fn main() void {
// ok
init() defer deinit();
// ok
_ = init();
// error: return value of function not assigned
init();
}
Maybe defer
and errdefer
aren't the right keywords anymore and it should be clean
and errclean
or some other pair.
Some things I like about this:
sdl.init() defer sdl.deinit()
Some things I don't like:
This isn't nearly as "safe" as some of the proposals here, but I think it would probably catch the majority of simple "forgot to defer" bugs related to resource-handling just by forcing the writer to explicit think about how they're doing clean-up.
@clarityflowers I like it!
My additional thought is: Maybe it doesn't need a symbol and can instead be a fn modifier? Like there's 'export' and 'extern' maybe there's... 'make' functions? or 'init' ?
make fn init() void {};
which tags the init function as requiring it be called with a following defer
.
If there are mutable comptime struct fields as in #5675, then:
The language could allow a type to specify one unique function named cede
(or something) that is "pure" i.e. it doesn't modify any comptime or runtime objects aside from the comptime objects declared within it, and returns void. That would be run implicitly at comptime every time the type's objects are lost to the programmer on scope exit. It would throw a compile error if the struct is in the improper comptime state.
First, the issue was reopened for the wrong reason.
Second, there isn't really a problem here that can be solved in a reasonable manner.
So makes me wonder why the issue is not closed.
This pattern is extremely common in zig code:
Generally:
This proposal is to
Strategy:
clean
corresponding todefer theCleanupFunction(resource);
errclean
corresponding toerrdefer theCleanupFunction(resource);
noclean
to indicate that you accept responsibility for the resource. Otherwise you geterror: must specify resource cleanup strategy
.The above code example becomes:
How to annotate cleanup functions:
Having functions which allocate a resource mention their cleanup functions will make generated documentation more consistent and helpful.