ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.12k stars 2.49k forks source link

Improve `packed` semantics #19660

Open ikskuh opened 5 months ago

ikskuh commented 5 months ago

This is a plan for hammer out the semantics for bit packed types in the Zig language.

Let's start with a tiny syntactical change:

(1) Rename packed to bitpacked

This change will reflect the true meaning of both packed struct and packed union more than packed and reduces confusion about the type of packing that is performed (as other languages use packed for byte packing).

This should be trivial to implement and update via zig fmt.

(2) Improve bitpacked union semantics

Right now, the semantics of packed unions are kinda … underdefined:

A packed union has well-defined in-memory layout and is eligible to be in a packed struct.

Let's change that definition to:

A bitpacked union has a well-defined in-memory layout and, similar to a bitpacked struct, is backed by a single integer of the size of the largest union member. bitpacked union(T) can be used to explicitly specify the size of the union.

All union members are aligned such that the LSB of all types align. This means that

const T = packed union(u32) {
    a: u32,
    b: u8,
};
var t: T = .{ .a = 1 };
std.debug.assert(t.b == 1);

holds true for both big and little endian systems.

This also means that not all union members share the same address, which is true for extern union.

(3) No changes to bitpacked struct

Except for the rename of packed to bitpacked, no changes are done here.

(4) Introduce a new type opaquebits(N)

This new type opaquebits(N) is an opaque bit bag that has no semantics defined except for "it takes N bits of space".

This type can be used for storing fully opaque data and the only way to interpret its content is by using @bitCast from and to it.

(5) Introduce a new type class std.builtin.Type.BitPacked

(This is definitly a stretch goal, and might require further work)

This type can be the backing type for both bitpacked union and bitpacked struct, which is similar to a regular struct/union type in that it has typed fields, but lacks most other properties.

The BitPacked type is a mix of structure and union and inspired by C# StructLayout.Explicit and FieldOffset, which allow construction of both structures and unions of an arbitrary layout.

This means that it's also legal to allow overlapping fields. For a union type, all bit_offset values are 0, for a structure, all bit_offset fields are layed out such that all fields are back-to-back.

Each field in a BitPacked type has an explicit bit offset and backing type:

pub const BitPackedField = struct {
    name: [:0]const u8,
    type: type,
    bit_offset: u16,
    default_value: ?*const anyopaque,
};

pub const BitPacked = struct {
    backing_integer: type,
    fields: []const BitPackedField,
    decls: []const Declaration,
};

It is not legal to create a type with unnamed bits.

Each BitPacked type behaves in codegen and on ABI boundaries as if it would be BitPacked.backing_integer.

(6) Incorporate #19395

Use bitpacked(T) struct/bitpacked(T) union instead of bitpacked struct(T) bitpacked union as those are more uniform in what the (T) actually modifies.

Use Case

One example where i'd like to have seen some of these features would be the EDID Video Input Definition:

Video Input Definition Layout

const VideoInputDefinition = bitpacked(u8) struct
{
    config: bitpacked(u7) union {
        analog: bitpacked(u7) struct {
            serrations: Support,
            composite_sync_on_green: Support,
            composite_sync_on_horiz: Support,
            separate_sync: Support,
            setup: enum(u1) {
                blank_is_black = 0,
                pedestal = 1,
            },
            signal_level: enum(u2) {
                model0 = 0b000, //  0.700 : 0.300 : 1.000 V p-p
                model1 = 0b001, //  0.714 : 0.286 : 1.000 V p-p
                model2 = 0b010, //  1.000 : 0.400 : 1.400 V p-p
                model3 = 0b011, //  0.700 : 0.000 : 0.700 V p-p
            },
        },
        digital: bitpacked(u7) struct  {
            standard: enum(u4) {
                none = 0,
                dvi = 1,
                hdmi_a = 2,
                hdmi_b = 3,
                mddi = 4,
                display_port = 5,
            },
            color_depth: enum(u3) {
                undefined = 0b000, // Color Bit Depth is undefined
                color6 = 0b001, // 6 Bits per Primary Color
                color8 = 0b010, // 8 Bits per Primary Color
                color10 = 0b011, // 10 Bits per Primary Color
                color12 = 0b100, // 12 Bits per Primary Color
                color14 = 0b101, // 14 Bits per Primary Color
                color16 = 0b110, // 16 Bits per Primary Color
            },
        },
    },
    type: enum(u1) { analog=0, digital=1 };
};

For additional type safety, config could've been done as opaquebits(7) and a fn unpack() union(enum){…} function could be done that returns either/or contained structure definition.

Some other structures in EDID contain even more nested bit field definitions on a singular byte or integer, so a more improved support for bitpacked data can help for sure.

Summary

For me, (1) to (4) are things that are definitly useful and would improve language semantics. (5) is definitly more of a stretch goal.

Snektron commented 5 months ago

(4) Introduce a new type opaquebits(N)

Whats the rationale that favors this over enum(N) { _ }? The main difference I can see is that it does not allow == and !=.

rohlem commented 5 months ago

bitpacked union(T) can be used to explicitly specify the size of the union.

Here T has a completely different meaning to union(T), where it is the union's tag type. See also my proposal https://github.com/ziglang/zig/issues/19395 to resolve this by instead putting the backing type (T) after the packed/bitpacked keyword.


Each field in a BitPacked type has an explicit bit offset and backing type:

The code instead shows backing_integer being not in BitPackedField but in BitPacked - either the text or the code are incorrect (either it's per-field or per-aggregate for all fields).


@Snektron From what I understand it forces using @bitCast instead of @intFromEnum and @enumFromInt. Besides that it seems pretty much equivalent, only that the enum can further be extended by declarations. (It doesn't seem like opaquebits(N) is meant to have a {} scope after it, but that's not fully clear to me.)

Vexu commented 5 months ago

I like 1-3, I don't see a need for 4, and 5 seems like a complex feature that combines unions and structs that implements #985 but only through a builtin, not syntax.

ikskuh commented 5 months ago

Whats the rationale that favors this over enum(N) { _ }? The main difference I can see is that it does not allow == and !=.

I don't see a need for 4

For me, it's something that prevents accidential comparability and allows for types that must not have a defined value (thus, cannot be constructed from zig land). Use case here would be modelling microcontroller register values where some fields have the value "don't ever touch these bits, do not write anything else than what you've read"

(It doesn't seem like opaquebits(N) is meant to have a {} scope after it, but that's not fully clear to me.)

Yeah, it is similar to anyopaque, but it has to be sizeable. Maybe opaquebit5 or something would also work.

Each field in a BitPacked type has an explicit bit offset and backing type: The code instead shows backing_integer being not in BitPackedField but in BitPacked - either the text or the code are incorrect (either it's per-field or per-aggregate for all fields).

No, both are correct. The backing type of a BitPackedField is BitPackedField.type, the backing type of the full struct is BitPacked.backing_integer

bitpacked union(T) can be used to explicitly specify the size of the union. Here T has a completely different meaning to union(T), where it is the union's tag type. See also my proposal https://github.com/ziglang/zig/issues/19395 to resolve this by instead putting the backing type (T) after the packed/bitpacked keyword.

True. Will adapt the original post to respect that, it's reasonable.

5 seems like a complex feature that combines unions and structs that implements https://github.com/ziglang/zig/issues/985 but only through a builtin, not syntax.

I can see that, it's not something that is necessarily needed, but would help in some cases where fields might overlap. But after some considerations, this could also be fully solved by casting between different instances of types

Khitiara commented 5 months ago

5 might actually be helpful for some osdev interop things where a packed struct field represents the upper bits of an address, aligned with the start of the struct but with lower bits used for other things

ikskuh commented 5 months ago

5 might actually be helpful for some osdev interop things where a packed struct field represents the upper bits of an address, aligned with the start of the struct but with lower bits used for other things

That was just an example, opaquebit5 is similar to u5, so `opaquebit31 or opaquebit32 would be valid as well

Khitiara commented 5 months ago

That was just an example, opaquebit5 is similar to u5, so `opaquebit31oropaquebit32 would be valid as well

not sure what this actually means, i meant point 5 of having something like structlayout.explicit but at bit resolution