ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.6k stars 2.46k forks source link

Proposal: Integer-backed packed struct #5049

Closed SpexGuy closed 1 year ago

SpexGuy commented 4 years ago

So far, Zig doesn't have a good solution for flags/bitfields. Packed structs are the current recommended solution, and they provide a lot of improvements over the standard C approach of defining integer constants. But they are still not quite sufficient, even with #3133. For me, the requirements of a good flags/bitfield type are: 1) Deterministic packing 2) Can be cast to and from an integer type, to support bulk operations (&, |, etc) 3) Well-defined conversion between little-endian and big-endian 4) Guarantees the size of its loads and stores in a predictable way, to be compatible with MMIO registers 5) Behaves as an integer when used as a parameter to an extern function, so as to be compatible with a C ABI that uses flags

Requirements (1) and (2) are already satisfied by packed structs. Requirements (3) and (4) are not satisfied by packed structs yet, but are kind of a package deal. If you define load/store size you get deterministic endianness behavior. There have been ways suggested to do this, such as putting a u0 aligned to the required word size at the beginning of a packed struct. (I can't find that suggestion anymore, if anyone knows where it is please let me know and I'll link it here). But assuming packed structs will always behave like structs from an ABI standpoint, they may not be able to satisfy requirement (5) for all calling conventions. This can lead to some pretty ridiculous workarounds, that are not always feasible.

To solve this, I propose a variant of packed structs which are backed by a specified integer type, similar to enums backed by a specified integer type. Just like enums are integers at the ABI level, integer-backed packed structs are also integers at the ABI level. They are passed in registers across function call boundaries whenever their backing integer type would be, and they are allowed on extern boundaries only when their backing integer type would be. They can be used in atomic operations just like their integer type would be (with the exception of RMW operations like add, sub, etc, similar to enums). Their default alignment matches the alignment of their backing integer type.

My proposed syntax is this:

const ExampleFlags = packed struct(u32) {
    int_flag: u1 = 0,
    bool_flag: bool = false,
    small_value: u3 = 3,
    aligned_value: u8 align(1),

    const Self = @This();
    pub fn union(self: Self, other: Self) Self {
        return @bitCast(Self, @bitCast(u32, self) | @bitCast(u32, other));
    }
};

Integer operators (+, -, &, etc) are not defined for integer packed structs, but @bitCast is allowed to convert to the integer type if you really want to do bulk operations. We could also make a more specialized cast, analagous to @enumToInt. Like normal packed structs or enums, bindable functions can be specified in the struct namespace to facilitate bulk operations at the API level.

Fields are specified from the LSB of the backing integer to the MSB. This definition will always mean that endianness conversion for the flags field is the same as endianness conversion for the backing integer type. If the number of specified bits is fewer than the number available in the backing type, or if there are padding bits due to aligned fields, the value of the extra bits is undefined. If you require these values to be zeroed (e.g. for interop with a C api, MMIO, or sort ordering), you must explicitly add padding fields that are set to zero. If more bits are specified than are available in the backing type, that's a compile error.

Integer-backed packed structs are allowed to contain anything that a packed struct can contain and follows all of the layout rules of packed structs, as long as the contained elements can fit inside the specified number of bits. The backing type must be an actual integer. It cannot be another integer-backed struct or an extensible enum. When necessary, conversions between those types can instead be performed with @bitCast.

When an integer packed struct is nested into another integer packed struct, it behaves like its integer type and is bit-packed, unless an alignment is explicitly specified.

Writing the entire value of a packed struct is guaranteed to be implemented with the same semantics as writing to a value of the integer type. Writing to a field in an integer-backed packed struct is semantically a read-modify-write operation on the entire integer. Whenever the write size is semantically observable (e.g. through atomic or volatile operations), the size of the read and write instructions for an integer packed struct is guaranteed to be the same as that of the backing integer type. As always, if the read/write is not observable, the optimizer is free to do whatever it wants (load/store the smallest size it can, split into multiple stores, combine loads/stores together, ignore all of them and use a register, etc).

Consequently, a pointer to a field of a backing type has the type *align(<parent struct alignment>:<bit offset in integer>:<byte size of integer>) <field type>. So &(ExampleFlags{}).small_value is of type *align(4:2:4) u3, and &(ExampleFlags{}).aligned_value is of type *align(4:8:4) u8.

This proposal would provide an explicit solution for #1834, #4056, and #4185, without limiting the capabilities of the more general packed struct. With some extra work, it could also be applied to solve #1761 and #3472.

Tetralux commented 4 years ago

The only thing with this is that if your packed struct contains several fields that are structs, or you just have a lot of fields, or indeed, you care more that's it's packed than the final size of it, you have to manually calculate the bits - which would be a bit of PITA.

SpexGuy commented 4 years ago

if your packed struct contains several fields that are structs, or you just have a lot of fields, or indeed, you care more that's it's packed than the final size of it

This proposal doesn't remove the current packed struct from the language, so those use cases should be handled by it. This is specifically for when you need guaranteed integer behavior, in order to be compatible with a C api, MMIO, or a particular byte swapping strategy.

Byte swapping a normal packed struct is a bridge we'll have to figure out how to cross eventually, since networking is one of its intended use cases. But the rules for that could be allowed to be quite complicated, since a standard packed struct can be much larger than a single integer. (note though, complicated does not necessarily imply slow in this case.)

ghost commented 4 years ago

@SpexGuy , I see that you referred to #4185. What are your thoughts on it?

I assume there is something you believe this proposal provides that #4185 doesn't?

SpexGuy commented 4 years ago

I think the enumflagset proposal also handles a similar set of use cases, so I think doing either one or the other makes sense. I first considered something like enumflagset, but there are a couple reasons why I prefer this solution:

// masking foo: Flags = 0; foo: Flags = .X; foo: Flags = .X | .Y | (if (bool_var) .Z else 0); foo |= .X; foo &= ~.X; foo = (foo & ~.X) | (if (bool_var) .X else 0); if (foo & .X != 0) { ... } if (foo & (.X | .Y) == (.X | .Y)) { ... }



I think it might also make sense to make a [FlagsMixin](https://github.com/SpexGuy/Zig-ImGui/blob/master/zig/imgui.zig#L21) class in the standard library that supplies functions for bulk operations (`|`, `&`, `~`), because these are a bit cumbersome with integer packed structs and it might be better to have one standard set of function names that everyone uses instead of having every library that has a flags struct pick their own names.
markfirmware commented 4 years ago

In this project I'm only using volatile only on aligned word pointers (to u32 or to packed structs of 32 bits.)

https://github.com/markfirmware/zig-vector-table/blob/c69f04caf843d9b7977980a5230f3ced415003dc/main.zig#L38-L56

@vegecode

ghost commented 3 years ago

I agree with this, although I would make one change to the syntax:

const ExampleFlags = packed(u32) struct { // ...

because struct(u32) on its own does not make sense.

mikdusan commented 3 years ago

There have been ways suggested to do this, such as putting a u0 aligned to the required word size at the beginning of a packed struct. (I can't find that suggestion anymore, if anyone knows where it is please let me know and I'll link it here).

4016 ?

N00byEdge commented 2 years ago

I would like it if each field could explicitly specify their bit offset (since that's 100% what you're trying to accomplish here), but I'm not sure how that syntax would look, but I'm certainly sure that it would look better than that align() syntax.

ifreund commented 2 years ago

If the number of specified bits is fewer than the number available in the backing type, or if there are padding bits due to aligned fields, the value of the extra bits is undefined.

Why don't we make this a compile error? What's the use-case?

SpexGuy commented 2 years ago

Good point, I agree.

gpanders commented 2 years ago

If the number of specified bits is fewer than the number available in the backing type, or if there are padding bits due to aligned fields, the value of the extra bits is undefined.

Why don't we make this a compile error? What's the use-case?

Does this mean the following would be a compile error?

const ProtFlags = packed struct(u32) {
    read: bool,
    write: bool,
    execute: bool,
};

I am only specifying 3 bits, but using a backing u32. This is a pattern I've used when interfacing with C code or binary file formats that use a larger integer type than is strictly necessary, either for alignment reasons or in order to reserve bits for future use.

Today, I just add an extra _: uX to the end, where X is whatever value is necessary to make the size of the struct match what is expected. This has to be calculated manually and it would be nice to be able to omit this, but it sounds like you are suggesting this should be a compile error.

SpexGuy commented 2 years ago

Yes, this should absolutely be a compile error. If you are interfacing with a C API where these bits are reserved, you need to set them to the reserved value (probably 0). Same if you are writing a binary file. Padding has undefined values, and nonstandard values in padding may produce an undefined value. So there must not be any undefined padding bits in a packed struct where the underlying representation is important.

gpanders commented 2 years ago

Would it be possible to infer the number of necessary bits to pad the struct to the correct size? Again using my previous example:

Proposed (status quo):

const ProtFlags = packed struct(u32) {
    read: bool,
    write: bool,
    execute: bool,
    _: u29 = 0,
};

I am already specifying u32 as the backing integer, so having to again specify u29 is redundant. If I add a field or update the backing integer size, I also have to update the number of padding bits. Maybe this is an example of the "strategic friction” that Zig aims for, but to me it feels repetitive.

If the compiler could infer the necessary number of padding bits then I could do something like this instead:

const ProtFlags = packed struct(u32) {
    read: bool,
    write: bool,
    execute: bool,
    _ = 0,
};

This way I can still initialize the unused/padding bits to avoid undefined values, but I also do not need to manually calculate the number needed.

praschke commented 2 years ago

I am already specifying u32 as the backing integer, so having to again specify u29 is redundant. If I add a field or update the backing integer size, I also have to update the number of padding bits. Maybe this is an example of the "strategic friction” that Zig aims for, but to me it feels repetitive.

this is only useful in the case of one padding region. as soon as you have more than one region you have to specify the size of all but one padding region. it also might complicate the syntax, though i don't know the details there. it strikes me as another moving part for a very minor convenience.

ominitay commented 2 years ago

@gpanders I know I'm bikeshedding, but here would be my suggestion for an approach:

Status Quo:

const ProtFlags = packed struct(u32) {
    read: bool,
    write: bool,
    execute: bool,
    _: u29 = 0,
};

Alternative:

const ProtFlags = packed struct(_) {
    read: bool,
    write: bool,
    execute: bool,
    _: u29 = 0,
};

packed struct (_) would instruct the compiler to infer the bit size of the backing integer from the packed struct's contents. There is never a case where this is ambiguous, so this shouldn't cause any problems. It would allow explicit backing integers to be used where appropriate (i.e. binary data formats, mmio), while allowing easy changes to packed structs in size where this restriction isn't necessary.

gpanders commented 2 years ago

The only problem with that is that the size of the struct is no longer obvious. In this example it’s easy enough to see, but in a more complex structure this would involve needing to add up the size of all of the individual fields.

as soon as you have more than one region you have to specify the size of all but one padding region

So? Just have the final _ be automatically inferred. Padding bits in the middle of the struct are significant as this corresponds to the underlying structure of the bit pattern or whatever, so it’s fine that their size has to be explicitly stated. The trailing padding bits are just there to ensure the size of the struct matches what the user specified.

ominitay commented 2 years ago

The only problem with that is that the size of the struct is no longer obvious. In this example it’s easy enough to see, but in a more complex structure this would involve needing to add up the size of all of the individual fields.

The size of an extern struct, an extern / packed union, isn't obvious either, along with plenty other types. Should we change them as well? I would argue against it, as it simply isn't necessary to syntactically indicate the size of a packed struct in plenty of cases. Where it is required, it would still be possible with my proposal.

So? Just have the final _ be automatically inferred. Padding bits in the middle of the struct are significant as this corresponds to the underlying structure of the bit pattern or whatever, so it’s fine that their size has to be explicitly stated. The trailing padding bits are just there to ensure the size of the struct matches what the user specified.

I think this defeats one of the main benefits of the explicit backing integer, which is that a change to the backing integer or fields must be made while consciously considering the size and layout. Having a field of the packed struct be automatically resized without obvious indication can hide plenty of mistakes/bugs, and obfuscates the layout of the packed struct, where you have to add up the size of all of the individual fields to determine the size of the padding, which is exactly what you argued against earlier.

Personally, I don't really care about whether we allow packed struct(_) or not, but I'm absolutely against having hidden resizing of struct fields.

Edit: packed struct would probably make more sense than packed struct(_), matching precedent with enum and enum(u8).

gpanders commented 2 years ago

The size of an extern struct, an extern / packed union, isn't obvious either, along with plenty other types.

Right, but the difference is in those cases you are not explicitly specifying an integer width like you are in packed struct(u8). If a packed struct (or union) is backed by an integer then knowing the size of that integer is valuable information. If it's not backed by an integer then the non-obviousness is less of an issue (imo).

I'm absolutely against having hidden resizing of struct fields.

Sure and I agree with this, but I don't think _ is a struct field: it's not accessible and it has no meaning other than enforcing the size of the struct itself.

The only other point I want to make (after which I'll leave it be) is that this is not just a matter of convenience (though it is convenient), it also resolves a source of potential ambiguity:

const Foo = packed struct(u8) {
  foo: u3,
  bar: u2,
  _: u4,
};

This fails to compile because the field widths add up to u9 but the integer specification states u8. Which is correct? You would probably say u8 is correct and the size of the padding field is wrong (and I would agree), but you can't say that with certainty precisely because there are two independent ways to specify the size of the struct and neither one is "obviously" the right one. Making the size of the final padding field inferred resolves this by making it clear that the integer specification in struct(u8) is the source of truth.

Your suggestion to use packed struct(_) similarly resolves this ambiguity except that it makes the "source of truth" the size of the padding field itself.

ominitay commented 2 years ago

This fails to compile because the field widths add up to u9 but the integer specification states u8.

This is a good thing, and helps prevent typos or other layout mistakes from compiling, which is of significant importance to osdev.

Your suggestion to use packed struct(_) similarly resolves this ambiguity except that it makes the "source of truth" the size of the padding field itself.

As it always has been.

Our conversation really has settled me on the side that neither of these proposals are of value, and that packed struct(u32) is enough, no implicit field sizing, no implicit backing integer. We should prevent ambiguity, and protect programmers from simple mistakes like typos. Both of these proposals harm this.

ominitay commented 2 years ago

Nitpick: In discussions outside of the issue tracker, the consensus seems to be that packed(type) struct is more sensible syntax than packed struct(type). This was already mentioned here, but it doesn't seem to have been picked up on.

andrewrk commented 1 year ago

This is implemented in the self-hosted compiler, which is now the default compiler.