Introduce @OpaqueHandle() builtin

lucabol commented 3 years ago

I separated this from here and expanded its description with a more robust rationale and current state.

Expanded rationale

This proposal allows representing a typical C idiom as exposed in several books (i.e., here) and libraries.

Also, it is a CERT C recomendation and a MISRA C recomendation to use in safe systems.

One can prefer more 'open' APIs or not, but the fact remains that this is a widely used and recommended pattern in the C (and C++) community. It should be easily expressible in Zig.

Current state

Assuming I got this right, the current state to implement something like this is unpleasing, and the presence of @ptrCast makes one think that it is dangerous (i.e., it is likely not going to make the safe system guys happy).

const std = @import("std");
const testing = std.testing;
const Allocator = std.mem.Allocator;

// Hides the data structure from importing files
const Hidden = struct { i: i32, j: i32 };

pub const MyPoint = opaque {
    const MyPointPtr = *align(@alignOf(Hidden)) @This();

    pub fn init(allocator: *Allocator, i: i32, j: i32) !MyPointPtr {
        var s: *Hidden = try allocator.create(Hidden);
        s.i = i;
        s.j = j;
        return @ptrCast(MyPointPtr, s);
    }
    pub fn deinit(self: MyPointPtr, allocator: *Allocator) void {
        allocator.destroy(@ptrCast(*Hidden, self));
    }
    pub fn get_i(self: MyPointPtr) i32 {
        return @ptrCast(*Hidden, self).i;
    }
    pub fn get_j(self: MyPointPtr) i32 {
        return @ptrCast(*Hidden, self).j;
    }
};

test "opaque" {
    var ally = testing.allocator;
    const my = try MyPoint.init(ally, 2, 3);
    defer my.deinit(ally);

    try testing.expectEqual(@as(i32, 2), my.get_i());
    try testing.expectEqual(@as(i32, 3), my.get_j());
}

Use case and proposal from the original issue

void glAttachShader(GLuint program, GLuint shader);

I'm not familiar with the gl api, but I assume that program and shader are effectively opaque types. Despite being integers, it would not make sense to do any arithmetic on them, right? They're more like fd's in posix.

Perhaps we can scope this down to enable specifying the in-memory representation of an otherwise opaque type. There are two features that we want at the same time:

A library will provide and accepts objects of a type that the client isn't supposed to do anything else with. These objects function as handles.
The handle must have some concrete in-memory representation so that the client and library can communicate coherently.

The recommended way to do this is to make a type with @OpaqueType(), and then use single-item pointers to the type as the handle.

const Program = @OpaqueType();
const Shader = @OpaqueType();

pub fn glAttachShader(program: *Program, shader: *Shader) void {}

But this mandates that the in-memory representation of the handle is a pointer, which is equivalent to a usize. This is not always appropriate. Sometimes the handle type must be c_int instead, such as with posix fd's, and c_int and usize often have different size. You have to use the correct handle type, so a pointer to an opaque type is not appropriate with these handle types.

Proposal

A new builtin @OpaqueHandle(comptime T: type) type.

const H = @OpaqueHandle(T);
const G = @OpaqueHandle(T);

var t = somethingNormal();
var h = getH();
var h2 = getAnotherH();
var g = getG();

assert(H != T); - You get a different type than you passed in.
assert(G != H); - Similar to @OpaqueType(), each time you call it, you get a different type.
assert(@sizeOf(H) == @sizeOf(T) and @alignOf(H) == @alignOf(T)); - Same in-memory representation.
H is guaranteed to behave identically to T in the extern calling convention. This includes when it is part of a larger type, such as a field in an extern struct.
h = t; t = h; h = g; // all errors - The handle types don't implicitly cast to or from any other type.
if (h != h2) { h = h2; } - Handles can be copied and equality-compared.
h + 1, h + h2, h < h2 // all errors - Whether T supported arithmetic or not, the handle types do not support any kind of arithmetic.
t = @bitcast(T, h); - If you really need to get at the underlying representation, I think @bitcast() should be the way to do that. Or maybe we should add special builtins for this, idk.

This is an exciting idea. I think this fits nicely into the Zig philosophy of beating C at its own game - Zig is preferable to C even when interfacing with C libraries. If you translate your GL and Posix apis into Zig extern function declarations with opaque handle types, then interfacing with the api gets cleaner, clearer, less error prone, etc.

Originally posted by @thejoshwolfe in https://github.com/ziglang/zig/issues/1595#issuecomment-425590705

FYI Andrew commented on his preference to have special builtins instead of reusing @bitcast. I think they were @toOpaqueHandle(x) and @fromOpaqueHandle(x).

daurnimator commented 3 years ago

The status quo recommendation is to use enum (T){_}; see https://github.com/ziglang/zig/issues/1595#issuecomment-495042309

lucabol commented 3 years ago

My example was misleading. This is not about wrapping primitive types, but encapsulating any type (aka it is not about creating your own integers,doubles with better type safety). I have changed the sample to make it clear. I don't think you can do that with enum(T){...}.

matu3ba commented 3 years ago

@lucabol It would help, if you give a (realistic) complete example before and after the change and motivate it with stuff like safety drawbacks of the API on concrete usage. var t = somethingNormal(); does not explain how H and G are being used.

Personally I am also missing how this cant be solved or should not be solved with comptime-generating the interface as described in the talk about static and dynamic dispatch as to justify the added language complexity. And yes, I am aware that there are not findable good static vs dynamic dispatch tutorials / simple examples.

InKryption commented 3 years ago

Just a thought: seeing as it is already possible and common-place to use enum(T) { _ } to make type-safe C-like handles, this then means that this proposal would only really "solve" one use case, which would be wrapping up zig types, and then also make it such that there are two ways to make type-safe C-like handles, which adds a layer of complexity (deciding which one to use).

Not completely against this, but also leaning towards agreement with the sentiment thus far. More real-world use case examples would be nice to see.

lucabol commented 3 years ago

If you can achieve what I describe below with enum(T) {...} then perhaps we don't need this.

In any case, here is the category of bugs we are trying to prevent, using our own ArrayList type.

// THE PROBLEM WE ARE TRYING TO SOLVE (i.e.lack of encapsulation)
test "Inexperienced new programmer code" {
    const a = std.testing.allocator;
    var l = std.ArrayList(u8).init(a);
    defer l.deinit();

    // Somewhere else, in a function far far away ...

    // Oh, I see, I need to set the capacity before appending
    l.capacity = 2;
    l.appendAssumeCapacity(1);
    l.appendAssumeCapacity(3);
    // SIGFAULT
}

We are trying to make it impossible for the inexperienced programmer on a large project to access things that are not meant for him. Aka, return some kind of opaque pointer:

test "If ArrayList were implemented to return an opaque pointer, that category of bugs can't happen" {
    const a = std.testing.allocator;
    var l = try MyList(u8).init(a);
    defer l.deinit(a);

    // I cannot access the capacity field because opaque type cannot have fields
    // l.capacity = 2;
    try l.append(1);
    try l.append(3);
}

Today to achieve that, you have to write this (unless there is an enum(T) trick I don't know about. BTW: I am using list: std.ArrayList(T) to simplify things (instead of a proper ArrayList implementation.

// CURRENT STATE TO BE ABLE TO Implement MyList
fn Hidden(comptime T: type) type {
    return struct { list: std.ArrayList(T), capacity: usize };
}

pub fn MyList(comptime T: type) type {
    return opaque {
        const MyPointPtr = *align(@alignOf(Hidden(T))) @This();

        pub fn init(allocator: *std.mem.Allocator) !MyPointPtr {
            var s = try allocator.create(Hidden(T));
            s.list = std.ArrayList(T).init(allocator);
            return @ptrCast(MyPointPtr, s);
        }
        pub fn deinit(self: MyPointPtr, allocator: *std.mem.Allocator) void {
            var s = @ptrCast(*Hidden(T), self);
            s.list.deinit();
            allocator.destroy(s);
        }
        pub fn append(self: MyPointPtr, i: T) !void {
            var s = @ptrCast(*Hidden(T), self);
            return s.list.append(i);
        }
    };
}

With this proposal the code looks much simpler and less error-prone as roughly:

// PROPOSED STATE TO IMPLEMENT MYLIST

fn State(comptime T: type) type {
    return struct { list: std.ArrayList(T), capacity: usize };
}

pub fn MyListOpaque(comptime T: type) type {
    return opaque {
        const OpaqueList = @OpaqueHandle(State(T));

        pub fn init(allocator: *std.mem.Allocator) OpaqueList {
            var s = try allocator.create(State(T));
            s.list = std.ArrayList(T).init(allocator);
            return OpaqueList;
        }
        pub fn deinit(self: OpaqueHanlde, allocator: *std.mem.Allocator) void {
            var s = @fromOpaqueHandle(self);
            s.list.deinit();
            allocator.destroy(s);
        }
        pub fn append(self: OpaqueHanlde, i: T) !void {
            var s = @fromOpaqueHandle(self);
            return s.list.append(i);
        }
    };
}

InKryption commented 3 years ago

I can't personally say I'm very convinced. What about your demonstration is less bug-prone than your example of status quo? Given an API like so, the user still has the ability to misuse the API by, e.g., copying the handle, expanding it, and then expecting the original handle to have overgone the same changes (expecting it to act like a reference). Or using the 'deinit' method twice (double freeing), or assigning to the handle using the 'init' method multiple times before freeing (leaking memory).

At the end of the day, one still has to understand, to a minimum degree, how to use the ArrayList(T) interface to make their program run. And that goes for any other interface. Zig is a language that gives the programmer control, and expects them to understand the tools it gives them; language-level encapsulation of fields has been discussed a few times, and the status quo has been to avoid it. For reference: #2059 and #2974.

Given all that, I'd say this is a weak feature proposal at best; potentially regressive at worst.

lucabol commented 3 years ago

I was afraid we would devolve in a philosophical discussion about the value of encapsulation for a system-level language. As I see it, my opinion (and yours) on such a matter is not important. We can talk about this all day and there is no way to prove either of us right or wrong.

The reality of the situation is that it is a very common pattern in C. It is recommended in most books. It is a CERT and MISRA explicit recommendation and it is used in many (most?) large-scale C applications (i.e., sqllite).

This doesn't make it 'right' in any sense, but it does make it important.

So our opinion doesn't matter. What matters is to do what is best for the Zig language. As Zig positions itself very explicitly as the C successor, it seems logical that such a pattern should be easily representable so as to facilitate the migration of C codebases and convince the (large?) subset of C programmers which deems this pattern important to use Zig.

InKryption commented 3 years ago

Alright, then omitting the second half of my comment, I'll ask again: what about this feature makes the pattern less bug-prone in zig? As stated, there are still numerous ways in which to misuse the API as is (as stated in my previous comment), which are far more common than setting the capacity to a higher value than allocated. Ultimately, what benefit does *@OpaqueHandle(T) hold over *align(@alignOf(T)) opaque{...}? Or even *align(@alignOf(T)) [@sizeOf(T)]u8? Because I don't think safety is the answer here.

lucabol commented 3 years ago

This pattern is identical in C as it is in Zig. It has the same trade-offs. Apologies for my directness, but I am afraid you are just re-stating your opinion. You think the potential bugs you mention are "far more common than (generalizing from the example) having access to the private state of a component".

In general, most large-codebase C programmers, book authors, MISRA and CERT authors disagree. They do recommend this.

If I had to venture an opinion on why that is (and it is just my opinion, not a fact), I would say that it is because the errors you describe are 'generic' errors. You are misusing the language. Accessing the private state of a component is a 'contextual' error, as sometimes it is safe to do so. Sometimes accessing the fields of a struct is the right way to use a component. Sometimes it isn't. You have to go and look for it, which some might not do.

What do we gain from this proposal, compared to the status quo? We gain a relatively clear way to represent a very common C idiom without resorting to mental or syntactic gymnastics, or possibly dangerous align pointer casting.

Is this the best way to represent this idiom in Zig? I am not sure. Perhaps allowing fields on opaque pointers is better. Perhaps visibility annotations on fields is better. I see this as a 'minimal' change proposal (not rocket science) that gives us a decent (perhaps sub-optimal) syntax for a very common pattern.

BTW: It is not even my proposal, so I have no deep emotional attachment to it. I just care about solving the problem.

InKryption commented 3 years ago

I believe we are at impasse then, so I will cede. And no worries, directness is the most efficient mode of communication.

But one note:

possibly dangerous align pointer casting.

There's no dangerous pointer casting required to use *align(@alignOf(T)) opaque{...}.

lucabol commented 3 years ago

Thanks for the excellent discussion. I would expect no less than vigorous pushback on language additions. Otherwise, you end up with C++/Scala/.... Each new proposal starts with -100 to pay for the added complexity.

For example, I would not have proposed this if it were a new keyword. As a builtin, and given that opaque is an existing type, it seems appropriate (to me). It is simpler, more lightweight, and consistent with the rest of the language than 'private fields on opaque' or 'on structs.'

BTW: I thought of another place I saw this idiom in C, the handles-as-pointers pattern in game programming.

Hopefully, there is enough in this thread to suggest that the idiom is widespread enough to be worth enabling.

InKryption commented 3 years ago

Coming back to this today, I think I've realized that there is more value to this than I was seeing, but I think what put me off was perhaps your example, as it contains a good few errors and inconsistencies (you return 'OpaqueList' in the init function instead of the casted memory for one example), which I think put me on a different mental page than you.

That said, I'd like to prod a bit more: it's also not clear to me in the latest example how the returned opaque struct and "@OpaqueHandle(State(T))" are related. Does an opaque struct with an @OpaqueHandle declaration cause the opaque to become the result of @OpaqueHandle? Or is the returned opaque struct just a namespace? If it's the latter, then I think your example would be best expressed by returning a normal struct with an opaque handle as a member, which then operates on the handle.

lucabol commented 3 years ago

Sorry, it was late in the night and no compiler to validate :-)

I got confused by the MyList code with the several @ptrCast(MyPointPtr, s); compiling and running the test. I then replaced mechanically such instructions with @OpaqueHandle and @fromOpaqueHandle. Looking at it again, I am not sure why the test runs fine to start with as it seems incorrect.

In any case, the intention is to enable a programming model like the below. I will post a full example once I get a bit more time, but your 'normal struct with opaque handle field' design seems right to me. Unless one goes heavy-handed allowing private on struct field or allowing opaque types to contain fields.

test "the goal is to hide the capacity field (aka impl details) on ArrayList from programmers" {
    const a = std.testing.allocator;
    var l = try MyList(u8).init(a);
    defer l.deinit(a);

    // I cannot access the capacity field here, but just access the made public functions
    // l.capacity = 2;
    try l.append(1);
    try l.append(3);
}

lucabol commented 3 years ago

Ok, I figured out why my previous code worked. Perhaps it is a known 'feature' in zig. I don't think I did it on purpose (or did I?). Let's call it the 'opaque over struct' pattern.

Once you see it, it kind of makes sense. I guess it is an alternative implementation to the 'normal struct with opaque handle field' way you suggested. It is more obscure for the implementer, but it has the advantage of not exposing an opaque pointer to the user of the API.

const PrivateState = struct { i: u8 };

pub const PublicComponent = opaque {
    pub fn init(allocator: *std.mem.Allocator) !*PublicComponent {
        var state = try allocator.create(PrivateState);
        state.i = 5;
        return @ptrCast(*PublicComponent, state);
    }
    pub fn getPrivateState(self: *PublicComponent) u8 {
        var state = @ptrCast(*PrivateState, self);
        return state.i;
    }
    pub fn deinit(self: *PublicComponent, allocator: *std.mem.Allocator) void {
        var state = @ptrCast(*PrivateState, self);
        allocator.destroy(state);
    }
};

test "Weird opaque over struct pattern" {
    const a = std.testing.allocator;
    var aComponent = try PublicComponent.init(a);
    defer aComponent.deinit(a);

    var i = aComponent.getPrivateState();
    //var j = aComponent.i; // you can't do this. It is opaque.
    try std.testing.expect(i == 5);
}

InKryption commented 3 years ago

Yes, I've used this pattern a few times in personal offline projects, although I haven't seen its use anywhere in the standard library, and unsure how popular its usage is. Was this the kind of functionality you were targeting?

lucabol commented 3 years ago

Yes. To be more precise, my overarching goal is to simplify the Zig representation of the 'opaque handle' C idiom at the start of the thread, as it is so widespread. I think that's a good goal. I am now not sure this proposal is the right solution.

The pattern above is one way to do it. Your suggestion of a struct with an opaque field is another (btw: can that be done without allocating the state?).

In a system language like Zig there are always ways to represent higher-level concepts by clever pointer manipulation and casting. So everything can always be done in some way. The question is how painful it is.

ifreund commented 3 years ago

This pattern of hiding the fields of a struct behind an opaque pointer as used in C is inefficient in many cases as it requires the struct to be heap allocated. The Zig pattern described here https://github.com/ziglang/zig/issues/9859#issuecomment-934454851 is the most direct translation of the C pattern and has the same disadvantage.

If I understand correctly, the proposed @OpaqueHandle builtin would remove this disadvantage by giving the consumer of the API knowledge of the size/alignment of the struct without allowing access to the struct's fields.

InKryption commented 3 years ago

can that be done without allocating the state?

In short, technically yes, but only if you give up absolute control of the memory, or give up flexibility. In long, memory has to live somewhere, so with a pointer to an opaque alone, you'd have to either use static memory, giving up flexibility (global variables), or request memory from the user, giving up absolute control of the memory (since they could decide to make the state invalid by modifying it, and as well, they would probably end up heap-allocating it anyway).

One alternative is to instead store the private state as an array of memory in the struct, which of course still leaves the possibility that the user may invalidate the memory anyway, but that's the deal. It could looks something like:

const std = @import("std");
const testing = std.testing;
const Private = struct { i: u32, j: u32 };

pub const Public = struct {
    impl: [@sizeOf(Private)]u8 = undefined,
    pub fn init(i: u32, j: u32) Public {
        var result = Public {};
        result.impl = @bitCast([@sizeOf(Private)]u8, Private { .i = i, .j = j });
        return result;
    }

    pub fn getI(self: Public) u32 {
        return self.getPrivate().i;
    }

    pub fn getJ(self: Public) u32 {
        return self.getPrivate().j;
    }

    fn getPrivate(self: Public) Private {
        return @bitCast(Private, self.impl);
    }
};

test {
    var public = Public.init(3, 2);
    try testing.expectEqual(public.getI(), 3);
    try testing.expectEqual(public.getJ(), 2);
}

This is what I originally thought you were proposing to replace (as described by @ifreund) by essentially making the type itself the "bag of bytes".

lucabol commented 3 years ago

@ifreund, @InKryption : this is how it would look. I think it also covers the case where Private needs to be allocated or it is a simple type (i.e., u32).

const std = @import("std");
const testing = std.testing;
const Private = struct { i: u32, j: u32 };

pub const Public = struct {
    impl: @OpaqueHandle(Private),

    pub fn init(i: u32, j: u32) Public {
        var result = Public {};
        result.impl = @toOpaqueHandle(Private, Private { .i = i, .j = j });
        return result;
    }

    pub fn getI(self: Public) u32 {
        return self.getPrivate().i;
    }

    pub fn getJ(self: Public) u32 {
        return self.getPrivate().j;
    }

    fn getPrivate(self: Public) Private {
        return @fromOpaqueHandle(Private, self.impl);
    }
};

test {
    var public = Public.init(3, 2);
    try testing.expectEqual(public.getI(), 3);
    try testing.expectEqual(public.getJ(), 2);
}

InKryption commented 3 years ago

Right; so then I'm guessing @fromOpaqueHandle would also work on pointers to @opaqueHandle(T)?

lucabol commented 3 years ago

Are you asking if the simpler @fromOpaqueHandle(self.impl) should work? It would certainly be nice if it could be made to work. Same for @toOpaqueHandle.

InKryption commented 3 years ago

No, it's just that in your example, there's no demonstration of how the 'allocated state' version would work, Specifically, I'm suggesting that either one or the other of these two should work for this proposal:

fn getPrivatePtr(self: *Public) *Private {
    return @fromOpaqueHandle(Private, &self.impl);
}

fn getPrivatePtr(self: *Public) *Private {
    return &@fromOpaqueHandle(Private, self.impl);
}

Essentially, how should the value semantics work?

Edit: I said allocated state, but I actually meant just modifying the memory in-place. Otherwise without some mutable pointer semantics, you'd have to re-assign state each time you wanted to modify it.

lucabol commented 3 years ago

What about @fromOpaqueHandle(Private, self.impl) returning a pointer to Private? I.E. is there ever a scenario where you want to return a copy?

fn getPrivatePtr(self: *Public) *Private {
    return @fromOpaqueHandle(Private, self.impl);
}

InKryption commented 3 years ago

Alright. Then the last thing I'd like to address before we try to do any summary of our points is @toOpaqueHandle: I think, although #5909 isn't actually implemented yet, it would make the most sense to follow suit and affirm that the return type of @toOpaqueHandle(expr) should be convertible to any @OpaqueHandle(@typeOf(expr)), so to speak. Are we on the same page for that?

lucabol commented 3 years ago

Absolutely.

ghost commented 3 years ago

This is a fairly long thread, so I apologize in advance if I missed something, but isn't this proposal essentially a form of round-about and coarse-grained field access control?

If Zig had something like struct { private x: u64 }, then that would be the way to implement abstract/opaque types and there would be no need for @OpaqueHandle(). I think. Taking your last example,

const Private = struct { i: u32, j: u32 };

pub const Public = struct {
    impl: @OpaqueHandle(Private),

    pub fn init(i: u32, j: u32) Public {
        var result = Public {};
        result.impl = @toOpaqueHandle(Private, Private { .i = i, .j = j });
        return result;
    }

    pub fn getI(self: Public) u32 {
        return self.getPrivate().i;
    }

    pub fn getJ(self: Public) u32 {
        return self.getPrivate().j;
    }

    fn getPrivate(self: Public) Private {
        return @fromOpaqueHandle(Private, self.impl);
    }
};

...would then simply become:

const Public = struct {
    private i: u32,
    private j: u32,

    pub fn init(i: u32, j: u32) Public {
        return .{ .i = i, .j = j };
    }

    pub fn getI(self: Public) u32 { 
        return self.i;
    }

    pub fn getJ(self: Public) u32 {
        return self.j;
    }
};

InKryption commented 3 years ago

@zzyxyzz I'd have to say you are correct; this was part of my original criticism, but we weren't speaking in similar terms, so I guess I got so caught up in trying to develop the details, that what you've just said didn't quite occur to me.

lucabol commented 3 years ago

I think I said it at the start of the thread: having private field would solve this (or having field on opaque). But that as been discussed many times and rejected (I think on the ground of language complexity).

This proposal is way less intrusive than private fields, but gives you many of the same benefits. That is what is interesting about it.

Edit: Another way to think about it is that in Zig you use struct for two very different scenarios:

A smallish bucket of bits that you send around, i.e. '`Point', or 'Elf'.
A possibly very large module system: i.e. 'DrawingSystem', 'PhysicsSystem', 'Accounting'.

This is the same situation as C.

Having private field would encourage people to use them in case 1. above where they are likely not warranted.

On the other hand some form of private state is very desirable for case 2. This is where this proposal comes in.

One could think of introducing the concept of module or namespace or similar, but that would add one more big concept to the language.

InKryption commented 3 years ago

Sorry, but I'll have to echo @zzyxyzz 's sentiment. And after having gone down through this discussion, to have it pointed out that this proposal would not solve what you originally intended it to, and now alternatively solving a use-case that, as you've well said, has been overall rejected, I think this is too fuzzy to really put into zig. Maybe there is a discussion to be had about private fields, or struct with private fields; but in that case, I would recommend you open an issue with razor-sharp focus specifically on that, what it would solve, the benefits gained from such, etc - if you really believe it is worth the discussion. Because this thread has been too all over the place.

ghost commented 3 years ago

I think I said it at the start of the thread: having private field would solve this (or having field on opaque). But that as been discussed many times and rejected (I think on the ground of language complexity).

Maybe it needs to be discussed one more time then. Language Complexity is such a sleazy argument. How does introducing three builtins for a convoluted way of doing something so common and straight-forward make the language "simpler" and "smaller"?

I think the use cases you present are legitimate, which is rather unsurprising, since information hiding is such a common-place technique in almost every modern language. My own opinion is that removing field access control was a bad idea, but also that it may not be too late to put it back. However, if Zig community/BDFL decides to make a stand on the status quo, then the use-cases presented here are (presumably) to be considered either unimportant or an anti-pattern, and the language should not be polluted with inferior solutions.

Edit: @InKryption, sorry didn't see your comment :)

lucabol commented 3 years ago

As I said in my other comment, I believe the refusal of private fields stems from viewing struct as a smallish bucket of bits, while in Zig they also serve as system boundary. The proposal tries to target the latter scenario without making the former more complex.

But I agree my proposal wasn't precise enough. Actually @InKryption designed it for me in the end :-)

As such, I am happy to close it, even as I feel the scenarios are very valid.

ghost commented 3 years ago

As I said in my other comment, I believe the refusal of private fields stems from viewing struct as a smallish bucket of bits, while in Zig they also serve as system boundary.

Again, I know this isn't your fault, but that's such a weak argument. It doesn't matter how many bits the bucket has. If ArrayList is a 1000-line definition with 30 methods, then it obviously has complexity to hide, and allowing it to hide methods but not fields is merely inconsistent rather than simpler.

Hmm. Maybe I should actually write up a proposal :smile:...

lucabol commented 3 years ago

@zzyxyzz : ArrayList definitely has something to hide. It is a module in the Parnas sense. Haskell, F#, and other languages have a module concept, C and Zig don't and use struct to fake it.

Perhaps private fields would make struct a good enough substitute for modules, or perhaps Zig should have such a concept.

This proposal, apart from the lack of precision, attempted to strike a middle ground enabling information hiding at a larger granularity level than the field, and at a lower language weight, hoping for the BDFL not to strike it down. At least, that was the intention.

ghost commented 3 years ago

Private field proposal is up: ~#9859~ #9909.

rohlem commented 3 years ago

To me the functionality of @OpaqueHandle sounds like a natural extension of the opaque we already have. I think it would look quite clean and intuitive to give it an optional type argument (like how union also permits union(u8), union(enum), and even union(enum(u8))).

opaque already constructs a new type every time. The only additional functionality of opaque(T) to specify is exactly copying properties, like size, alignment, and probably packed-ness / representation safety, pinned-ness, etc. of the argument. This should also mean you can pass opaque(T) by value, as long as T is of known size and alignment (and not f.e. an unsized opaque itself - which should also be supported, for completeness' sake).

InKryption commented 3 years ago

@rohlem An interesting idea, but even if implemented, I don't know if something like opaque(opaque {}) really makes any sense. Although it has been stated that language complexity is a "sleazy" argument, I would still argue that that would be a bit needlessly complex. If I were to look at opaque(T), I think it would be most beneficial if I could quickly make the assertion "ah, it's a sized type", instead of having to dig around and determine whether T is sized or not.

rohlem commented 3 years ago

@InKryption Like *const void is allowed even though it gives no benefit over *void, the value lies in reducing the need for special casing in generic type expressions. I like the trend of unified syntax and only pointing out errors when the actual behaviour requested can't be satisfied. In this case, anything requiring opaque(opaque{}){} to be sized would still give an error, while f,e, pointers to it can work just like they do for plain opaque{}.

InKryption commented 3 years ago

@rohlem If you're referring to the *const void equivalent in C, then that's not entirely true, as making it "const void" serves - if not as type information (given C's fairly weak type system) - as a context clue and documentation, at the very least (as in, it tells the reader that the pointer is to memory that shouldn't be modified). If you're referring to using `const voidin zig, I'm not 100% sure what you'd use that for directly, outside of it being the result of a generic function, given that void itself has no in-memory representation and thus can't be casted to or from by any runtime types (so I imagine you'd actually be referring casting to or fromc_voidoropaque {}pointers). And as well, for any casts that don't suffer from this, you can't cast a constant pointer to a mutable pointer, so it's not true in zig thatconst opaque {}provides no benefit overopaque{}`.

And when it comes down to it, I also don't see where such functionality would become useful, unless you maybe argue that it would benefit very complex, generic functions, that in particular make use of this "opaque layout mimicry".

InKryption commented 3 years ago

Seeing as #9909 was rejected, I imagine the next logical conclusion would be that this should also be closed, being that #9909 was essentially the successor to this proposal (and was the fruit of the long discussion here).

topolarity commented 3 years ago

Seeing as #9909 was rejected, I imagine the next logical conclusion would be that this should also be closed, being that #9909 was essentially the successor to this proposal (and was the fruit of the long discussion here).

I agree that @andrewrk's response in #9909 was pretty comprehensive

It also reminded me of the "We're all consenting adults." standard in, e.g. Python, and I personally agree with him that eager, granular encapsulation can be an anti-pattern, often leading to limited extensibility and worse performance.

I do think there's one open question that could use some more input, though:

Why does Zig support reference semantics for opaque {} types but not value semantics?

I agree with @ifreund above that reference semantics can have serious performance implications for usages of opaque (i.e. encapsulated) types. If we agree that opaque types should support reference and value-semantics, if they are supported at all, then this issue really boils down to a stance on type-opaqueness in general.

Either opaque types:

Have idiomatic use cases in Zig, in which case allowing value semantics for opaque types seems to be important to avoid unnecessary allocations
Exist primarily for compatibility with C's limited encapsulation semantics, which don't support opaque value types. Zig's recommended approach may be to instead to cast back-and-forth between a "bag of bytes", to use distinct types as in #1595, or even to never encapsulate at all

[Bikeshedding] If we go with (1), I think it's worth considering a qualifier syntax, opaque struct { .. } and opaque union { ... }, as an alternative to the built-ins proposed here, to preserve symmetry. If we go with 2, it might be worth renaming opaque {} to extern opaque {} or similar, to make it clear that this is not a feature intended for "idiomatic" Zig code.

matu3ba commented 3 years ago

Why does Zig support reference semantics for opaque {} types but not value semantics?

Purpose of @Opaque is information hiding, which implies hiding the value types with sizes. If you are not convinced, check the meaning of opaque.

So intentionally there are no value semantics, because there are no values. Its basically just a pointer with a name to anything (that you intentionally dont want to expose with size+alignment, cause you dont want your code to access it).

topolarity commented 3 years ago

I think I should have defined my terms more clearly: An opaque type in this case would be one that does not expose structural information to a public API. In particular, opaque type fields and memory layout would be private, not allowed to be directly accessed by clients.

One way to implement this encapsulation is as in C: Make sure that the underlying layout is literally unavailable to the compilation unit, and only allow reference to opaque types.

However, it's also possible to use opaque types as values in client code. In this case, the memory layout is available to the compilation "unit", as needed to pass value types across the ABI, but client code is not allowed to refer directly to structural details, preventing API breakage if these change.

If you look at many of the examples above, you'll see that they correspond to this expanded sense of allowing opaque types to be used as values.

Why hide structural information from client code?

The guarantee that a library writer gets from opaque encapsulation in general is that (1) client code cannot mis-handle these fields, since it cannot directly interact with them, and (2) member fields can be freely added/removed from an opaque struct or opaque union without breaking downstream code.

(2) can actually have important performance implications for stable, public APIs when desired optimizations require changing core API structures, such as in the famous case of Python's C API

matu3ba commented 3 years ago

@topolarity

Advantage: compiler errors on missing optimisations due to wrong build arguments (to make sure the opaque symbol can be looked up).

Drawback: Zig leaks the the name and place, where the symbol is defined (for C it would be only the name).

Summary: As this is not strictly opaque anymore, since the file/thing that opaque points to can not be moved and renamed freely without breaking code, I am skeptical about usefulness of the change.

One way to implement this encapsulation is as in C: Make sure that the underlying layout is literally unavailable to the compilation unit, and only allow reference to opaque types.

If you build files together with zig -ofmt objec1.o file1.zig file2.zig .. filen.zig the compiler can resolve the underlying types (since it holds them in intermediate representation) and use them for optimisation. The same should hold for C code. However, if you link object files together, the same problem with cross-language LTO applies: https://blog.llvm.org/2019/09/closing-gap-cross-language-lto-between.html As I understand it, the object file does not represent value information corresponding to the opaque, because it could not be looked up (yet). However this could be tested in Zig and C.

mlugg commented 1 year ago

I recently thought of an interesting syntactic idea for this - it might be silly, but I figured I'd put it here anyway.

Currently, we have opaque{} for meaning "a type of unknown size". Note that we also have anyopaque to mean "any value of unknown size". Oftentimes, what we really want from an "opaque" type is that the internal representation should be hidden; as correctly identified in this issue, the next logical step is opaque handles. So, maybe we can extend the opaque syntax to make this work?

Here's my idea. Extend the opaque { ... } form to be opaque(T) { ... }, where T is a type. This is a type whose in-memory representation is identical to T (so if T can be passed over ABI, so can an opaque(T)), but which has none of the operations or declarations of the original type. Like the current form of opaque, it can contain declarations. Now, here's the bit I think is really nice: opaque { ... } - that is, the current syntax - is equivalent to opaque(anyopaque) { ... }! It's a type with the "same representation" as anyopaque (i.e. completely undefined representation and size). I think this is a nice approach: it adds opaque types (which I do think are a feature that would fit Zig nicely) whilst not doing any breaking syntax change and keeping a nice logical consistency with existing features. (Note that it seems totally reasonable to me for the existing opaque { ... } syntax to stick around as a shorthand for opaque(anyopaque) { ... }.)

matu3ba commented 1 year ago

@mlugg (remashed from reddit discussion) Afaiu, you suggest to use opaque(anyopaque) {..} for "true opaques" with C semantics and opaque(T) {..} for in the same compilation unit optimizable structs, which do not expose the internal type. For regular compilation units, that sounds like a good solution.

What I am missing is which optimisation potential (along LTO) should be given: For example, it's unclear if using @opaque(structfield) should 1. emit a regular symbol and be only usable as opaque within the same compilation unit, if 2. some "special object format should make linker error out on symbol usage" or if 3. the symbol should be omitted, if not used.

ghost commented 1 year ago

@mlugg, The idea has merit, but if I understood Andrew's position correctly, any attempt to turn plain-old-data into an abstract data type that is manipulated only through its methods, is to be considered an anti-pattern in Zig. As such, opaque only exists because C libraries in the wild use the opaque pointer technique, as described in the OP, and we want to have a clean way to interface with such libraries. However, opaque is not intended for internal use within Zig projects, where your data structures are not dictated by an external API. Making the opaque type mechanism more flexible, whether in the form of OpaqueHandle(T), opaque(T) or opaque struct { x: T } would go against this logic, and is probably not going to happen any more than private fields. At least that's the impression I got from the discussion on #9909 and other places on the internet.

mlugg commented 1 year ago

@zzyxyzz, in general (for e.g. structs) I totally agree with that conclusion, however DOD has some good examples of cases where hiding the operations of the underlying representation is desirable. Storing plain untyped indices, as you currently often do in Zig DOD, is type-unsafe: you can easily get them mixed up, and even do nonsensical things like add them. This can be improved by wrapping in a type with methods to do the reasonable access patterns (i.e. index the relevant thing), so then it becomes obvious when you do something wrong (you end up unwrapping the value!). Opaque types would be a very clean approach to this - right now you'd have to use an extern struct if you didn't want to lose well-defined-layout, which is a bit icky.

I agree that "data hiding for the sake of data hiding" is a bad idea, and goes against Zig's ideals. However, I think there are cases where it is useful to actually prevent bugs (particularly when wrapping primitive types), and in these cases it's justified. It's also worth noting that opaque(T) is only "hiding" in the sense of making it more difficult to access the underlying representation - anything can still get at it with the conversion builtins (whatever they may be called).

InKryption commented 1 year ago

right now you'd have to use an extern struct if you didn't want to lose well-defined-layout, which is a bit icky.

Perhaps in the general case, but in the specific case of typed indices in DOD setups, we already have non-exhaustive enums, which already accomplish this use case more than well enough. There's even an instance of this in the compiler iirc, so the argument for opaque(T) or whatever else needs something other than typed indices to stand on.

ghost commented 1 year ago

@mlugg, In addition to what @InKryption already said, it sounds like you are arguing more in favor of distinct rather than opaque types? If so, there's an open issue for that, and also for the special case of enum arrays and typed indices.

ziglang / zig