Open SpexGuy opened 4 years ago
As the behaviour is undefined, i vote to disallow all empty extern structs. As you noted, many compilers still allow it, but since there is no concrete behaviour i think it should be up to the user to work around those issues. Perhaps an exception should be made for translate-c though? Im not sure how much those empty structs are used in practise, but if its often that might be required.
Brainstorming here: the "ABI" part of the target triple implies a reference C compiler that is the definition of the ABI. So for example, x86_64-windows-msvc means we use MSVC as the reference C compiler that we are ABI matching. x86_64-linux-gnu means we are using GCC as the reference C compiler that we are ABI matching. So that could potentially answer the question: does an extern struct allow having no fields => what does the "main" C compiler of that ABI do?
However that being said, I think if the C99 specification is clear, which it looks pretty clear to me, that 0 fields in a struct is UB then we should make an empty extern struct a compile error.
For packed structs we should allow 0 size, however, it would be a compile error to use a size-0 packed struct in a C calling convention function, just like it is a compile error to use u0
or void
in a ccc function.
In that case, should we impose that same lessened restriction on extern structs? Allow zero-size extern structs with the current semantics but don't allow them (or types that reference them) for C calling convention functions?
In what kind of situation could those be preferred over a packed or regular struct?
I think that, since this is mainly a GNU C ABI issue, this is something better handled by translate-c
, rather than by the language itself. The only sensible use for empty structs in GNU C is as a sort of 'typed' opaque pointer (which is strange, because a simple declaration of the struct, without braces, can do this as well, and is defined in the C standard). So all that's needed is for translate-c
to identify pointers to empty structs, and represent them with an opaque Zig struct.
In that case, should we impose that same lessened restriction on extern structs? Allow zero-size extern structs with the current semantics but don't allow them (or types that reference them) for C calling convention functions?
This prompts what I see to be the main backing question here: is extern
a declaration that a type is intended to be used in the C ABI? Or does it only declare a memory layout? If the former, then extern
has utility in that it provides a helpful compile error if you try to add a field (or lack thereof) that can't be represented in the C ABI. If the latter, then I think your suggestion above makes sense. This is related to #3133.
The status quo answer to this question is that extern
is a declaration that a type is intended to be used in the C ABI. I do think a proposal to change is worth considering, especially in light of #3133 and #3802. Shall we transform this issue into that proposal?
I agree that that's worth considering, and that it backs this question. I'll update. Feel free to modify.
I apparently think that this should be changed, because I was in the middle of typing up this response:
In what kind of situation could those be preferred over a packed or regular struct?
extern
struct isn't just for ABI use, it's also the only way to lay out memory in a way that respects alignment. You may want to lay out some data and then read it back later interpreted a different way. For example, I've used this struct (based on this GDC talk) in the past for a half-edge structure with good cache behavior:
const Edge = extern struct {
vertex_index: u32,
opposite_edge_index: u32,
};
const Triangle = extern struct {
edges: [3]Edge,
flags: u64, // same size as an Edge
};
comptime { assert(@sizeOf(Triangle) == @sizeOf(Edge) * 4); }
const AdjacencyMesh = struct {
vertices: []Vertex,
triangles: []Triangle,
fn edges(self: AdjacencyMesh) []Edge {
return @ptrCast([*]Edge, self.triangles.ptr)[0..self.triangles.len * 4];
}
fn edgeIndexInTriangle(edgeIndex: u32) u32 {
return edgeIndex % 4;
}
fn triangleIndexFromEdgeIndex(edgeIndex: u32) u32 {
return edgeIndex / 4;
}
};
There may be a reason to include an empty struct or a pointer to an empty struct in a data structure like this, potentially as part of a generic type. But using type punning with types that are generic in that way is not exactly good practice, so maybe it's fine as is?
Here are the struct concepts that zig recognizes:
@alignOf(FieldType)
also known as "ABI alignment"In status quo zig + accepted proposals mentioned above, we have 3 kinds of structs:
struct
extern struct
packed struct
As you can see, the 3 options do not do an adequate job of surfacing the struct layout options that are available. I do think struct
is satisfactory, but I could see the value in replacing extern struct
and packed struct
with different syntax that better surfaces the options here. I'm also open to the possibility of removing the feature of "allow non C ABI compatible types" being a property of a struct, and making it a part of validation of C calling convention functions, as noted above.
One important question to answer is: will ordered, ABI aligned structs always match the C ABI? I think the answer is "yes". But if there were any counter-examples that would influence the design process here. I'm not aware of any counter examples.
One usecase to consider is exporting a C library that has functions that take a pointer to a Zig struct as a parameter, but only exposes the struct as opaque in the header. The compiler should probably allow that somehow.
Zig already allows pointers to anything in CCC functions
As you can see, the 3 options do not do an adequate job of surfacing the struct layout options that are available. I do think struct is satisfactory, but I could see the value in replacing extern struct and packed struct with different syntax that better surfaces the options here. I'm also open to the possibility of removing the feature of "allow non C ABI compatible types" being a property of a struct, and making it a part of validation of C calling convention functions, as noted above.
I touched on this in #6478, but I think that the best way to give programmers fine-grained control over the memory layout of a struct, is to provide a mechanism to explicitly set the field offsets. Any combination of alignment/ordering/overlapping can then be represented, so it generalizes to working with C union types as well. As a bonus, it allows the programmer to not just conform to the C ABI, but to conform to and define any ABI they can think of.
For academic purpose, I would like to add that the C2x working draft spec (published in february 2020 and available here: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2478.pdf) reiterate the same undefined behaviour, although with a slightly more specific constraint, id est:
If the member declaration list does not contain any named members, either directly or via an anonymous structure or anonymous union, the behavior is undefined. (See page 87 paragraph 10).
Nota: the word "member" replace the former syntax "struct" (see before-last line page 2).
Would this change apply to union
as well?
The C specification says:
The size of a union is sufficient to contain the largest of its data members. Each data member is allocated as if it were the sole member of a struct.
so it seems like agreement in the C ABI for structs would also extend to unions.
If so, extern union
would nearly be able to replace packed union
, depending on the alignment details.
I think it's rather useful to be able to define guaranteed layout structs with automatically aligned fields, which packed structs don't do: one has to do manual alignment in case of the latter. As mentioned in an earlier comment, one might want to read the struct data in a different way. To contribute one more example to the one proposed in the comment, consider the following case which I had for real.
So there is a struct of simd vectors (some vectors of i32
, some of f32
) of the same size, which one should be able to alternatively read/write as a flat array of 32 bit integers or floats. Besides the vector fields, other fields of the struct may be nested structs of the same kind, some of which may be empty (apparently the entire top-level struct still can be seen as an array of 32-bit values in such cases). In this particular case probably packed structs would have worked too, but conceptually extern structs seem to better express the intention, as the fields are expected to have their natural alignments by design, not by chance of the other fields having aligned sizes. If extern structs disallowed zero sizes or non-C-ABI-compatible types, one would have no choice but use packed structs. The latter would have caused problems in case of alignment padding between the struct fields, which one would need to correctly maintain "by hand". The same concern of maintaining alignment by hand would apply to this proposal.
In this regard allowing any types and sizes within extern structs sounds not just as a good idea but as an important and indispensable part of the language's functionality. The related idea of preventing non-C-ABI-compatible extern structs in C-calling-convention functions may be raising some questions, as maybe such functions are not going to be used exclusively for interfacing with C, but some other languages (supporting similar ABI) as well. Or maybe one wants to interface just with a particular C compiler, which supports zero size structs and maybe some non-standard types, idk. Not sure if additional safety attained by generating errors for such structs is worth the lost functionality.
How about simply splitting extern struct
into two distinct versions:
extern struct
has guaranteed ordering, respects default alignment and allows non C-compatible typesextern "c" struct
is the same but only allows C-compatible types
This is a small distinction but it changes the result of questions about how zero-sized extern structs should behave, as well as questions about whether they can reference non-extern structs.
The primary use cases for extern in the absence of the C ABI are type punning and MMIO mappings. Packed may not be a great solution for these cases because alignment and read/write speed are both important.
Better discussion of this question
Original Issue: Potential design flaw: Pointer to empty extern struct has no bits
My initial line of reasoning was this: Extern structs are ABI types and must match the layout of C. Pointers to extern structs are also ABI types. Therefore pointers to extern structs should always be real pointers, even if the struct is empty.
This might also extend to packed structs.
However, it appears that C doesn't actually allow empty structures. The C99 specification says:
But in practice, most compilers allow empty structs.
gcc
,clang
, andicc
all give the struct size 0 when compiling as C and size 1 when compiling as C++.msvc
has a compile error for C, and uses size 1 for C++. So I guess this issue isn't as clear cut as I first thought. Still, I think it merits discussion. Maybe it should be a compile error?