ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
32.63k stars 2.38k forks source link

Proposal: Owned pointer type hint #3690

Open 42triangles opened 4 years ago

42triangles commented 4 years ago

I think it would be great to have an actual annotation for when a pointer is owned or not. This has both the advantage of having the difference in the type system instead of the documentation, and may allow future static analysis in regards to some kind of memory safety guarantees and memory leakages.

Such a pointer would coerce to a normal, unowned pointer, and would add a field to the TypeInfo.Pointer struct, and allocators would ideally return an owned pointer. An unowned pointer could explicitly be turned into an owned one, but not implicitly coerced. I would very much like owned pointers to be only movable, but I believe that this is outside of the scope of the language, especially given that this would be the only type to behave as such (I think).

Advantages: Some simple static analysis in the future, harder to misuse destructor functions which take an owned pointer, no documentation required to know whether a pointer is owned or not

Disadvantages: Added complexity, breakage of existing meta programming code, would *owned restrict variable names? Would @Owned(*...) be acceptable, given its difference to other pointer attributes? It also may not even be a needed feature for most zig users, given that documentation can do the job "well enough" most the time.

Alternatives: not doing it, creating a standard library type that wouldn't support coercing and the like, creating a way of adding general annotations to types to types and either allowing things to hook into the zig compiler to evaluate them (allowing comptime arguments to them easily) or just ignoring them in the compiling process, meaning that validation tools would work outside of the compiler entirely.

Also, I want to add that I've had a blast with the language so far, and do look forward to it maturing further :)

JesseRMeyer commented 4 years ago

Does this specifically mean that only the owner of a pointer can change what it's pointing to? In other words, it's known as a const pointer to non-owners.

42triangles commented 4 years ago

No, this is specifically meant to hint at which code path is responsible for freeing memory. So an owned pointer in an argument would mean "This function will deallocate it or return it (-> the original pointer shouldn't be used after the function was called with it, it's likely to dangle)", and in return position it would mean "This pointer isn't managed by any other object, and will lead to leaked data if not freed".

ghost commented 4 years ago

Semi related: I've thought about the idea of "owned" pointers, with respect to const propagation. Specifically, for struct fields that are pointers. The status quo in Zig (and C) is that if you have a const pointer (*const) to a struct, you can still mutate objects that its fields point to. (Unless those fields are marked as const pointers in the struct itself, but that's usually unworkable). If a struct field could be marked as an "owned" pointer, then constness would propagate through to it. In practice, a struct field might be a pointer only for internal, implementation reasons (something happened to be allocated on the heap). From the outside (to code that has a const pointer to the struct), that field should be considered immutable, the same way non-pointer fields are.

I don't know if this idea of "owned" is related to yours though. Also not sure if there are some murky cases which this binary owned/non-owned distinction would fail to cover, and which might defeat the whole idea.

JesseRMeyer commented 4 years ago

the idea of "owned" pointers, with respect to const propagation. ... I don't know if this idea of "owned" is related to yours though.

That's what I had in mind.

42triangles commented 4 years ago

Also not sure if there are some murky cases which this binary owned/non-owned distinction would fail to cover, and which might defeat the whole idea.

Well, owned pointers would be more of a hint, and you could still do everything via non-owned pointers. If something is "half-owned", via reference counting or something like that, a normal pointer would be used instead. But often enough, things are "owned" by a data structure and therefore / only by a codepath, and this would signal that. Plus, maybe there are optimization opportunities if using a pointer to an object after an owned pointer to the same object went out of scope by passing it somewhere else is UB, but I don't know enough about LLVM to know.

The const propagation stuff is interesting, but would require that distinction to be exactly opposite (→ default = owned, instead of default = non-owned). Although that is an interesting property as well, maybe there should be *owned, *shared (= "half owned", dunno if "shared" is the right word to use there) and normal pointers?

JesseRMeyer commented 4 years ago

I prefer features more enforceable than 'hints' to be part of the language specification. You could address this in part by a user-space convention. Just have an Owned_Allocator struct provide the functionality you need, by wrapping pointers in an owned struct. The compiler wouldn't stop anyone from freeing the pointer itself, but you could guard against this in your own code with some metaprogramming (I've not tried metaprogramming in Zig yet but it should support this use case).

42triangles commented 4 years ago

I prefer features more enforceable than 'hints'

Which is understandable, although the aforementioned optimizations would go beyond that

You could address this I'm part by a user-space convention

Which is how it's currently handled, the documentation in fact states that you should document "ownedness" for your functions, iirc. Allocators aren't really the main concern here though, and you cannot guard against improper usage via meta programming, as that is only possible on the type-level and value-level (via comptime), but not the actual language-level, afaik.

And while sure, the status quo works, I do believe that there is room for improvement. Plus, annotating code and possibly having metaprogramming on the language level as well, would work too - but be a lot more work. (But you could have more complex stuff like a full blown borrow checker a la Rust as a library, which I think would be kinda awesome)

But either way, those are just my opinions on the matter.

JesseRMeyer commented 4 years ago

We both agree that Zig can be improved to better handle this use case.

Allocators: The specific use case issue here is that objects are being handled as pointers and nothing more stable. That is why I suggested authoring an allocator that returns a Struct, something stable, who carries a data pointer and some other information, such as which allocator owns it, etc. That way, you just send in the struct to your functions, and you don't have to worry if the data pointer changes since the allocator on the other side does the stitch work for you. The type system keeps you safe by passing in the correct types. This transforms the issue from an 'ownership' problem to just a matter of using the right type, which I think is appropriate here. There is a lot of conceptual baggage that 'ownership' implies, and it tends to muddy the water.

Meta-programming: Right, but I think most of the value is in the type and value level for this problem. Maybe that's just a difference of perspective here. I ultimately prefer a language that empowers me to design solutions which are right for my program over a language that forces any particular convention. While Zig is opinionated, but it doesn't get in your way very much.

42triangles commented 4 years ago

Sorry for abandoning this issue for a bit, anyway...

For one, marking which allocator returned a pointer is something that I would very much like to see, especially in the light of every data structure holding a pointer to it. Which is why it would be really nice to actually allow allocators to "annotated" at compile time. Something like

// global 
var actual_allocator = createMyAllocator();
const AllocHint = struct {
    pub fn getAllocator(this: @This()) *@import("std").mem.Allocator {
        return &actual_allocator.allocator; 
    }
};
// in main 
const alloc_hint = AllocHint{};
var list = ArrayList(MyStruct, AllocHint).initEmpty(alloc_hint);

That would be possible in today's zig, and would work similarly for pointers too, while not incurring any runtime cost as long as the allocator isn't dynamic in some way. And if it is dynamic, it would be pretty much the same cost you'd have either way, assuming it's not a debug build.

Either way, regarding the original issue: Maybe creating a wrapper type with just one field is the way to go, although that would require manual conversions and make derefs longer to type too.

Anyway, if there's no interest in built in owned pointers (with possible optimizations) beyond mine, I'm willing to close this issue.

matu3ba commented 3 years ago

@42triangles @JesseRMeyer

As I understand it, you are basically discussing Rust lifetime annotations, but you want to save them in memory instead of checking them in the type system.

For me, ownership means permission to mutate, which can at any time do only exactly 1 computation instance. Except for atomic types or when one opts out.

Please let me know, if I misunderstand your thoughts.

42triangles commented 3 years ago

@matu3ba I am not discussing lifetime annotations in the main issue (though having them being able to be checked somehow would definitely be great), only the distinction of what would be Box and &/&mut in Rust (more or less, since there are no automatic destructors in Zig, at least there weren't last time I checked).

I feel like both lifetime annotations & non-aliasing mutable references by default are both things that are probably not in the scope of the language (though I'd definitely like to see especially the latter if I'm wrong about that), while a distinction that is meant to make clear what part of the code is responsible for eventually freeing it could even be a part of only the standard library, though I personally think it would make sense for making the language more ergonomic if owned pointers could be coerced to unowned ones.

42triangles commented 3 years ago

Having read the documentation to get up to date with how zig is looking at the moment, I found this: https://ziglang.org/documentation/0.7.1/#Lifetime-and-Ownership actually mentions exactly what I mean, saying that you should just document it.

matu3ba commented 3 years ago

@42triangles

I feel like both lifetime annotations & non-aliasing mutable references by default are both things that are probably not in the scope of the language

See at #1108 to change the default behavior of aliasing (to noalias) due to better speed once the LLVM patch lands and safety. Noalias does also exist in zig and making it default would be a good change (further I dont think the costs should be particular big to check it in the compiler).

Lifetime annotations may become an optional part, once Rust has figured out what the costs are on doing it properly. I think the costs could be even marginal, since the analysis might work without any IR in zig (in contrast to Rust). There would be definitely costs to put the lifetime rules into IR for the analysis or generation of another IR for the borrow checker.

UPDATE: Lifetime annotations would need to be passed to AIR (previously called typed ZIR) for analysis of the memory layout of the concrete types. This makes zigs compact representation nonviable and hence reduces compilation speed very notably.

So one must either 1. reimplement zig compiler or 2. find efficiently source code <-> AIR mappings, parse lifetime annotations mapping to variables and add the parsed lifetime annotations to the AIR for lifetime evaluation/borrow checking. This problem sounds to me to be in a similar complexity class (even for not using comptime) than mapping source to assembly, which requires using a SMT solver like cogent did.

On also checking comptime, one would need to to implement a parser to ZIR including the lifetime informations as resolving comptime is nontrivial.

CONCLUSION: Better do lifetime checks in a Lisp-dialect and lower it to zig. The other approach is too much churn and maintenance redundancy..