Closed ghost closed 3 years ago
I feel the argument you make in the objections ("In addition, Zig is increasingly making legal but potentially buggy patterns into hard errors, somtimes controversially (e.g. unused variables).") is a very good one. This is something that I feel Zig has evolved in since #2059.
Maybe worth moving that argument up to the introduction, to pay it more attention?
A possible objection is that this could also be achieved with a strict naming convention, e.g. starting variables with an underscore will mark them as if they're private, and perhaps be enforced with future linter rules or the like. Just a thought.
To support private fields, from zig zen
:
As for documentation value, in status quo I would suggest:
pub const ByteList = struct {
// public decls and members here
data: []u8,
private: ByteListPrivate,
};
/// non-public, private decls and members here
const ByteListPrivate = struct {
allocator: std.mem.Allocator,
capacity: usize,
};
Accessing fields in .private
should not occur accidentally, and be relatively easy to spot in a code review.
If you use reflection to iterate over fields, maybe have a private-informed branch of std.meta
to exclude/ warn about specific field names.
I'll probably have to reread the @OpaqueHandle
discussion, but it should be possible to replace the field with an equally-sized-and-aligned byte array align(@alignOf(Private)) private: [@sizeOf(Private)]u8,
.
Then @ptrCast
/@bitCast
back-and-forth with non-public helper functions.
The missing piece then is "making the compiler complain" when overwriting the .private
field.
The one scenario that comes to my mind would be reassigning the entire containing struct (ByteList
above).
Maybe declaring the .private
field to be pinned
, as in #7769, would already be enough of a solution - granted though, the actual source of the error would be the fact you can't copy from .private
, instead of inability to copy to .private
.
This might be really useful, it doesn'make sense to me why access is a concern for decls, but not fields. It should be either both or neither.
Just an opinion: Maybe its private fields that should be marked as such, not public?
As for setter/getter boilerplate, half of a problem can possibly be solved with something like 'readonly' keyword, which would effectively act as '*const' outside of a file scope.
@rohlem
OpaqueHandle would indeed provide equivalent functionality: You factor out your private state into a separate struct, wrap it in an @OpaqueHandle
, and then use @toOpaqueHandle()
and @fromOpaqueHandle()
to read and write. My question is, why use such a roundabout way to hide state, when there is the well-established technique of visibility modifiers?
Also, and somewhat ironically, the OpaqueHandle trick only works because it leverages the existing ability to declare a private struct. Otherwise there would be very little point in the whole concept of OpaqueHandle, and you might just as well go with a naming convention like _private
, as @vijfhoek suggested.
I have been using naming conventions for my private fields, like _capacity: usize
in order to discourage usage, but I feel like having pub for fields would be better.
Personally, I quite like that data in zig is so transparent; it makes no attempt to model anything other than what the computer is really doing.
That said, there is obviously a use case for data hiding, but I don't think it should be as easy as just marking a field as private; and as well, I think this shouldn't be as powerful as the privatization of declarations, and moreso act as stronger documentation than comments. Essentially, I think we should continue to encourage transparent data, by making private fields harder to declare in some way.
In that regard, I actually think something similar to what was modelled in #9859 would actually be better for this purpose - using a separate "opaque bag of bits" as a section for private data. This would not only slightly discourage private data, but also document even better the separation of private and public fields.
I think there is perhaps also an argument to be made that, if private fields, whether in the proposed style or in the #9859 style, maybe some form of controlled access into private fields should be enabled anyway, like a built-in (e.g. @privateField(expr, "private field name")
). Because unlike what was argued against @call
being able to access private functions, accessing what is just data wouldn't be as possibly destructive to any "library state" - and as well, as has been said in this proposal, even with private fields, we still have the ability to mess with the data, so maybe granting an assuredly safe mechanism to do so in lieu of "smartness" would be a good move (and as a bonus, it would be a clear mark of possible code-smell, versus aforementioned smartness).
Just some food for thought. I'll stop typing now.
Personally, I quite like that data in zig is so transparent.
So do I, and in private projects I don't usually bother with access restrictions, even in other languages. But is this sufficient reason to make data hiding inconvenient or impossible in places where it is needed (e.g. public-facing libraries or high-assurance projects)?
In that regard, I actually think something similar to what was modelled in #9859 would actually be better for this purpose - using a separate "opaque bag of bits" as a section for private data. This would not only slightly discourage private data, but also document even better the separation of private and public fields.
I'll have to disagree here. To me, pub
or private
is as clear documentation as it gets for the intended access level. Putting everything into a single private section may also interfere with data layout, although that is a secondary concern.
As for adding a @privateField
override, I'm all for it. Like you say, this indicates code smell clearly enough, and should allay fears along the lines of "other programmers will abuse private fields and will only get in my way". Though I don't see why private decls shouldn't be accessible in the same fashion. While reading private data is not as bad as calling private methods, modifying it is probably worse. Also, both would be needed for "extending" library containers, for example.
@zzyxyzz As it pertains to private declarations, I think the most convincing rationale I've heard for it is that it forces change at an API level, as opposed to forcing it at the library-usage level; that is to say, if a library has marked a certain function as private, but the community at large has decided that it would be best marked public, this will either force the library implementers to push the change, or force a fork of the library that implements this change (with the former being the preferred outcome). This as opposed to forcing people wrapping said library to use functions that don't have the same guarantees as those explicitly marked public.
I can't quite recall where I read that here, or whether or not that's a good enough reason. But for the purposes of arguing against applying this logic it to private fields, at the very least, I would point to the fact that although a field may be marked private, it cannot be guaranteed to be inaccessible, where its memory is accessible in the public-facing API (as explained), and thus it makes sense to treat field access modifiers more like built-in documentation tools, as opposed to strict API boundary controls.
(Please tell me if any of this makes sense, my synthesizing skills are not pristine).
@InKryption
that is to say, if a library has marked a certain function as private, but the community at large has decided that it would be best marked public, this will either force the library implementers to push the change, or force a fork of the library that implements this change (with the former being the preferred outcome).
I've heard this argument too, but I find it unrealistically idealistic. Even assuming that your proposed change will be accepted into the public API (which will not happen most of the time, either because your use-case is too specific or the authors plain disagree with your way of doing things), the usual timeline for the change is measured in months or years. And outright forks rarely happen at all, unless the project is abandoned by the previous author. So, while this position does represent best practice for upstreaming changes in open-source projects, it has little to no relevance for everyday software development.
I am glad that we generally agree, then. I'm just bringing the argument to the forefront.
If the private
keyword is added, it's not syntactically compatible (functions, etc are private by default and exported by the pub
keyword)
@sina-devel I would argue that it makes sense for functions to private by default, as it incentivizes explicitly building a robust public-facing API, as opposed to simply allowing everything to spill into global scope. Whereas, here, I think it is more useful to overall make transparent data the default, and incentivize having to explicitly decide if data should really be marked as an "internal" component (which would likely lead to more boilerplate related to reading and writing to the data, which should certainly not be the default, in my opinion, coming from a C++ background).
adding private
or internal
:
capacity
of ArrayList
)right now:
by default private and exported by pub
:
capacity
of ArrayList
)adding keyword to readonly at outside and mutable at inside:
Get/set function are not exclusively a bad thing, sometimes they are needed to accept data from a user in a different way than it is stored internally, and sometimes you really need to keep different fields in sync.
For example, library can provide a chunk of data, that user can store in whatever way they need, where it's still makes sense to keep some non-const, non-user-defined cache:
const Texture = struct {
renderer_id: u32, // not appropriate for user to change
width: usize,
height: usize,
pixel_format: enum { R8_G8_B8_A8, Grayscale },
filter: enum { Nearest, Bilinear },
mipmap_levels: usize, // should be in sync with renderer
pub fn load(path: []const u8) @This() { ... }
pub fn unload() void { ... }
pub fn genMipmaps(self: *@This(), mipmaps: usize) void { ... }
};
In this example, you don't ever want to modify some fields without syncing state with the renderer, but may need to pass container around freely, and allow operations on an instance of a struct in member functions.
Argument that fields shouldn't even be allowed to be hidden steers all libraries to be enclosed state machines, limited to hiding data by using top-level declarations.
// value_container.zig
var value: usize = 0;
const value_container = @import("value_container.zig");
test "acces from a different file" {
value_container.value = 3;
}
Output of this test:
error: 'value' is private
value_container.value = 3;
I guess what i am trying to say, is that acces control concerns are equally valid for functionality and data, and should, in my opinion, be consistent. Either it's "everything is transparent, all the time", or access control is a useful thing for both.
Maybe a way to alleviate some of boilerplate, is to allow reading from private fields?
const Container = struct {
value: usize, // field without any specifiers is private by default
};
var c: Container = .{ .value = 0 };
var read = c.value; // reading from private fields is allowed
c.value = 42; // writing to a private field not allowed
const Container2 = struct {
pub value: usize, // and here is a way to make it writable
};
@di-ant, I could endorse read-only fields as a last-resort compromise, but not as my preferred option. They can successfully catch unwanted data-modification, which is admittedly better than nothing. But public fields (whether they are writable or read-only) are part of the public interface, and changing or removing them is a breaking change. Precisely expressing the intent that something is part of the internal implementation and should not be modified or relied on, can only be properly done by restricting visibility.
How would this interact with comptime reflection?
@andrewrk, If you find the time, could you provide a short explanation for why this was closed -- or a link if this was already discussed somewhere else?
Since this is something that Zig does differently from other languages, it may come up again in the future. So it might be good to have a findable "official position" on this issue.
Here is my reasoning:
There are many properties that can be "leaked" into the public interface of an API. For example: size/alignment of a struct, performance characteristics of a given function implementation, whether a function can be executed at comptime, existence/non-existence of declarations, and more. It is up to the documentation of a given package to specify what exactly the major version is protecting - that is - what exactly counts as "breaking" and what does not. Whether a given field is part of the public API or not is one more item in this list. Of course most of these have general recommendations for defaults:
The idea of private fields and getter/setter methods was popularized by Java, but it is an anti-pattern. Fields are there; they exist. They are the data that underpins any abstraction. My recommendation is to name fields carefully and leave them as part of the public API, carefully documenting what they do. A good example is a field protected by a mutex:
const Foo = struct {
/// This field must be accessed only when `mutex` is held.
unprotected_counter: usize,
mutex: Mutex,
};
In this case the field name unprotected_counter
properly communicates the danger of using the field without using any abstraction. However it being public also allows proper abstractions to be built, if the ones provided are not enough. For example, if it could only be accessed via getCounter()
like this:
pub fn getCounter(foo: *Foo) usize {
foo.mutex.lock();
defer foo.mutex.unlock();
return foo.unprotected_counter;
}
Then if the API user held the mutex for some other reason, getCounter()
would actually deadlock (or rely on a silly hack such as recursive mutexes). The API user needs the ability to interface with the abstractions.
In my subjective experience, public fields generally lead to better abstractions by eliminating the temptation to attempt full encapsulation, when the more effective strategy is to provide composable abstraction layers.
The bottom line here is that private fields is an addition to the language, making it more complicated and requiring the language to answer a bunch of questions that otherwise would not exist. Not only does the feature does not provide enough value to merit the language complexity, I contend that it actively leads to worse source code.
Thank you for the detailed explanation!
I can't say I agree with it, but it's certainly good to know where things stand.
@andrewrk I understand your reasoning and agree with it. On the other hand, it's not clear is it safe (is this intended / allowed) to access struct fields directly, as they can be unprotected.
A would propose to have private/public fields, so they can't be accessed accidentally.
But not hiding private fields from accessing completely. If you really need you can.
This is how it'd don in TypeScript with object!.field
syntax, which means "Hey, I know what I'm doing!".
const obj = struct {
const Self = @This();
// private by default or have # syntax as modern JavaScript proposal
#length: usize,
pub fn length(self: *Self) {
return self.#length;
}
}
// Know what I'm doing!
const a = obj {};
const len = a.!length;
// or like this
const len = a.#length;
@likern
I agree that adding an escape hatch is the proper answer to the concern that interfaces will become too locked-down and uncomposable. This solution was already suggested in this thread, though (in the form of @privateField(a, "length")
). Using a built-in for this has multiple advantages, such as not requiring new syntax and being easier to search for. The additional verbosity may also be seen as a benefit, since overriding visibility is not to be used casually.
As a reminder this issue is closed
I know this issue is closed, but just in case it is ever reconsidered, I'd like to post this reply to Andrew's comments. This has been eating at me for a while, so here goes:
There are many properties that can be "leaked" into the public interface of an API. For example: size/alignment of a struct, performance characteristics of a given function implementation, whether a function can be executed at comptime, existence/non-existence of declarations, and more. It is up to the documentation of a given package to specify what exactly the major version is protecting - that is - what exactly counts as "breaking" and what does not. Whether a given field is part of the public API or not is one more item in this list.
Yes. But lets not throw the baby out with the bathwater. Just because there will always be things (some of them quite esoteric) that cannot be formally expressed in the interface, that should not necessarily prevent us from picking the low-hanging fruit. Field access control is a simple and well-established technique, not unlike the distinction between var
and const
. Implying that it is about as useful (and complex) as, e.g., a compiler-checked formal specification of the performance characteristics of a function is incorrect, IMO.
The idea of private fields and getter/setter methods was popularized by Java, but it is an anti-pattern.
I would not say that getter-setters are per se an anti-pattern. Their excessive use in some languages certainly is, but when there is a legitimate need for an encapsulated interface, they're just the thing. And it's not like Java is the only language with private fields. Almost all popular languages have them. I did not have the impression that Rust, for example, was suffering from a getter-setter epidemic.
My recommendation is to name fields carefully and leave them as part of the public API, carefully documenting what they do.
This will work well sometimes, and sometimes it won't. I'm pretty sure that many library authors would prefer to hide some parts of the implementation, rather than carefully arranging things such that there is nothing to hide.
In this case the field name unprotected_counter properly communicates the danger of using the field without using any abstraction. However it being public also allows proper abstractions to be built, if the ones provided are not enough.
I note that the building of additional abstractions is just as likely to be prevented by low-level methods being private. Does this mean we should remove visibility controls etirely and rely on careful naming and documentation for methods and other decls as well? Besides, anticipated disagreements between library authors and users about what should or shouldn't be hidden can also be resolved with an explicit override (e.g. @privateField()
).
Then if the API user held the mutex for some other reason, getCounter() would actually deadlock (or rely on a silly hack such as recursive mutexes).
So the fields should be public in this particular case. It does not follow that struct fields in general should always be public. Plenty of valid examples can be presented on both sides of the argument.
The bottom line here is that private fields is an addition to the language, making it more complicated and requiring the language to answer a bunch of questions that otherwise would not exist.
Arguably, adding private fields would simplify the language, at least from the user perspective, since it makes the language more consistent and more in line with patterns familiar from other languages.
In my view Andrew makes a good case for why open interfaces should be preferred in general. He does not make a convincing case for why they should be forced to always be open. Projects differ. Requirements differ. Sometimes what you need is an open library, at other times it's a piece of high-assurance code that is audited forwards and backwards. Neither is right or wrong, they're just different, and I see little reason to pick one over the other when we can easily support both.
I am a beginner at Zig, but I come from a background in many other languages. Perhaps my experience is wholly irrelevant to Zig, but I wanted to share my perspective as it seems a bit different from @andrewrk.
I’ve noticed private fields have 3 major benefits for library authors, consumers, and the ecosystem as a whole:
I do not believe we will have Go’s level of stability in the Zig ecosystem without giving library authors a standard way to say, “this is what I promise will remain stable.” Other things are needed as well, of course, but this seems to be an important piece.
Zig already has default private behavior with requiring authors to make things pub, so pub fields seem like such a natural inclusion from a beginner Zig-user’s POV.
I don't think zig is a language that will necessarily do something "just because other languages do it".
As someone who used to use languages which have private fields, and who now uses one - not zig - that does not.... It is absolutely amazing. Specifically, it makes debugging very lucid. I don't have to drop to a debugger for 99% of the things I need to do. Fields should absolutely NOT be hidden from access (because otherwise if you want to introspect them, you would need privileged code, i.e. std.debug.print "wouldn't just work"). However, I believe there is a use case for fields to not be assignable values outside of their own scope. It is a common thing in the language that I use that curious programmers dive into what is a "private api" and on a version change things get weird.
I think it's worth saying, to rebutt the reasoning about docs as a suitable solution for private fields, that docs are always the WORST way of communicating code contracts to other developers, the best being language features which enforce certain behavior (types, mutability modifiers, access modifiers, etc.), the second best being widely recognized conventions.
Java has the "private" field language feature, Python has an underscore prefix code convention. Both of these are extremely valuable for communicating the intended use of a struct or library. It's valuable not just to the developer who may or may not (probably not) be fully reading the docs of the library they're using, but also to any code reviewers to show that a library is being used in an unintended way. It's not hard to imagine a new developer modifying the "capacity" field of a struct because intellisense suggested it first over "setCapacity" (which does additional, necessary work), and it getting missed during code reviews since there's no indication at all that this is a misuse of the class unless you comb over the library's documentation.
Not introducing ANY concept of "You probably don't want to call/write this function/field" is a huge mistake. If Zig wants to strive for being a maintainable language used by a team of more than a half dozen developers, then it MUST introduce either a language feature or code convention to communicate what is meant to be public API and what is not. It doesn't matter if it's Java-style private fields, or a Python-style naming convention, or something inbetween, any of these are adequate for communicating an unusual use of a library during development and code reviews. Docs, and expecting only developers of a certain rigor will use your language, are simply not an acceptable substitute for this kind of important & easily communicable information.
Private fields would've made this memory leak harder to happen (or maybe prevented it):
I still think a private field is a good idea, at least building some conventions like prefixing the field with an underscore helps a lot. I understand people want to keep Zig concise and small, but it is very hard nowadays to tell what constitute the public API of a type by skimming through the code.
I still think a private field is a good idea, at least building some conventions like prefixing the field with an underscore helps a lot. I understand people want to keep Zig concise and small, but it is very hard nowadays to tell what constitute the public API of a type by skimming through the code.
While it totally makes sense not to over-complicate the language, and just to play devil's advocate here. :)...
If we take the assumption that self-documenting code is a good thing, then I think it's safe to assume that if the intent is to be C evolved then it's ok to introduce a couple new concepts. Not need to go all C++/Rust on it and ruin the vibe though when we be jamming.
So, if keeping is simple is the way, then perhaps we can make the secondary option "official" where we rely on naming conventions. Like with an underscore decorating a "private" member, as we did with JavaScript to fake it before it got that abstraction. It would be akin to the "official" style guides and solve the self-documenting concept while also not adding new abstractions if someone really does wanna shoot themselves in the foot by messing with it.
Introduction
Currently, function and variable declarations are private by default and are made externally visible with the
pub
modifier. However, data fields are always public and there is no way to restrict their visibility. Apparently, field access control was never really working in Zig (#569) and was officially removed in #2059. Which is somewhat surprising, since no other modern language I can think of (Java, C++, C#, D, Rust, Go, ...) fails to provide this feature.The lack of private data makes it impossible to do proper encapsulation / implementation hiding, which is widely considered to be a basic software engineering technique. In particular:
.capacity
of anArrayList
is overwritten.Possible objections
No official reason was given in #2059, but two possible objections to private fields come to mind:
Concerning 1, I'd say that protection does not have to be perfect to be useful. Preventing accidental and semi-deliberate messing is valuable by itself, not to mention the documentation value.
2 may be a real concern, or maybe not. The same argument can be made about private decls, but a proposal to override the visibility of methods at the call site (#8779) was recently met with a resounding rejection -- even though there are certainly cases where it is both reasonable and safe to call private methods. In addition, Zig is increasingly making legal but potentially buggy patterns into hard errors, somtimes controversially (e.g. unused variables). The lack of basic implementation hiding is not consistent with this safety-first attitude, IMHO.
Syntax
The simplest option is to adopt the same default as with decls: file-level private by default and externally visible with
pub
. If this clashes with data-oriented style, the opposite default can be chosen, but that wold require the introduction of aprivate
keyword. Yet another possibility is to make field visibility struct-level, e.g. withopaque struct {...}
.I don't really have a strong opinion here.
Open questions
Update 1: As a side benefit, it may be possible to remove the
opaque {}
type from the language, since it is rarely used and can be simulated withstruct { private ptr: usize }
.Update 2: I wouldn't mind adding an escape hatch like
@privateField(object, "name")
for cases where you need to use a particular library, but find the API too locked-down. Some discussion of this is here.