jorangreef commented 4 years ago

While working on a port of liburing to Zig, I found some rough edges with Zig's unions. I believe these are real world examples of C interop where Zig's unions could be improved.

Consider the io_uring_sqe struct from io_uring:

struct io_uring_sqe {
  __u8  opcode;   /* type of operation for this sqe */
  __u8  flags;    /* IOSQE_ flags */
  __u16 ioprio;   /* ioprio for the request */
  __s32 fd;   /* file descriptor to do IO on */
  union {
    __u64 off;  /* offset into file */
    __u64 addr2;
  };
  union {
    __u64 addr; /* pointer to buffer or iovecs */
    __u64 splice_off_in;
  };
  __u32 len;    /* buffer size or number of iovecs */
  union {
    __kernel_rwf_t  rw_flags;
    __u32   fsync_flags;
    __u16   poll_events;  /* compatibility */
    __u32   poll32_events;  /* word-reversed for BE */
    __u32   sync_range_flags;
    __u32   msg_flags;
    __u32   timeout_flags;
    __u32   accept_flags;
    __u32   cancel_flags;
    __u32   open_flags;
    __u32   statx_flags;
    __u32   fadvise_advice;
    __u32   splice_flags;
  };
  __u64 user_data;  /* data to be passed back at completion time */
  union {
    struct {
      /* pack this to avoid bogus arm OABI complaints */
      union {
        /* index into fixed buffers, if used */
        __u16 buf_index;
        /* for grouped buffer selection */
        __u16 buf_group;
      } __attribute__((packed));
      /* personality to use, if used */
      __u16 personality;
      __s32 splice_fd_in;
    };
    __u64 __pad2[3];
  };
};

C allows you to access a union field without knowing about the union:

sqe->off = 0;

In fact, the original definition of io_uring_sqe had off as a u64, without any union. Any C code that assigned to off directly just worked when the kernel later introduced the off union. However, I think Zig forces you to go through the union:

sqe.*.union1.off = 0;

I don't know if there's a design decision for this, but it has some unpleasant consequences:

Whenever the kernel converts io_uring_sqe fields into unions in future, this breaks all Zig code, which will need to be updated to go through the union. C doesn't have this problem since C's unions make it possible for code like this to be future-proof.
Nested unions like this become more and more unwieldy to work with compared to C.
We have to think more about naming unions, whereas these unions are anonymous in C.

To illustrate these warts, consider Zig's definition of io_uring_sqe:

pub const io_uring_sqe = extern struct {
    pub const union1 = extern union {
        off: u64,
        addr2: u64,
    };

    pub const union2 = extern union {
        rw_flags: kernel_rwf,
        fsync_flags: u32,
        poll_events: u16,
        sync_range_flags: u32,
        msg_flags: u32,
        timeout_flags: u32,
        accept_flags: u32,
        cancel_flags: u32,
        open_flags: u32,
        statx_flags: u32,
        fadvise_flags: u32,
    };

    pub const union3 = extern union {
        struct1: extern struct {
            /// index into fixed buffers, if used
            buf_index: u16,

            /// personality to use, if used
            personality: u16,
        },
        __pad2: [3]u64,
    };
    opcode: IORING_OP,
    flags: u8,
    ioprio: u16,
    fd: i32,

    union1: union1,
    addr: u64,
    len: u32,

    union2: union2,
    user_data: u64,

    union3: union3,
};

Comparing the structs closely, you can see we already have a problem, because the kernel has since introduced new unions, which Zig is yet to match, i.e. addr (to add splice_off_in) and union3.struct1.buf_index (to add buf_group). I believe any and all Zig code using the existing addr and buf_index fields would need to be updated to go through the respective new unions, whenever Zig's std lib struct definition of io_uring_sqe is updated.

Furthermore, since the kernel interposed the new addr union between our union1 and our union2 unions, we now need to rename and renumber our unions (breaking more code) or come up with a better naming scheme.

For now, if Zig unions stay the same, then I think something like the io_uring_sqe struct would be better served by not using unions in the Zig version, but rather using the original u32s or u64s, since these are always future-proof.

But hopefully someone can think of how we can improve C union interop in Zig without introducing other issues?

ikskuh commented 4 years ago

Imho the best solution is this: Don't use translate-c but hand-translate those unions. Then nothing will break when the source material changes in a backwards compatible way. translate-c is nice, but it's way better to wrap C APIs in pure Zig, replacing all [*c] with the corresponding correct pointer type and such

jorangreef commented 4 years ago

Thanks @MasterQ32 , we are not using translate-c here at all. I am actually rewriting io_uring helpers for Zig from scratch, and simply referencing liburing's interface design decisions so that these helpers make sense for anyone who is already comfortable with liburing's documentation. This is comparing Zig's std lib definition of io_uring_sqe with that of the kernel.

jorangreef commented 4 years ago

The problem is that Zig's unions force you to access union fields by going through the name of the union.

In other words, whenever the kernel adds new io_uring features by converting existing u32 or u64 fields into unions, then adding the same unions to Zig's std lib definition breaks all existing Zig code using the linux.io_uring_sqe definition in the Zig std lib and accessing those u32 or u64 fields directly.

jorangreef commented 4 years ago

Does that make sense?

Vexu commented 4 years ago

This should be covered by #985 and #1214.

jorangreef commented 4 years ago

Thanks @vexu!

Rocknest commented 4 years ago

@jorangreef i've made a function that may solve your use-case:

const io_uring_sqe = extern struct {
    opcode: u8,
    flags: u8,
    ioprio: u16,
    fd: i32,
    union1_unnamed: extern union {
        off: u64,
        addr2: u64,
    },
    // etc
};

export fn _off(x: *const io_uring_sqe) u64 {
    return x.union1_unnamed.off;
}

export fn _off_2(x: *const io_uring_sqe) u64 {
    return flatField(x, "off").*;
}

export fn _fd(x: *io_uring_sqe, y: i32) void {
    x.fd = y;
}

export fn _fd_2(x: *io_uring_sqe, y: i32) void {
    flatField(x, "fd").* = y;
}

Source:

const std = @import("std");
const meta = std.meta;

fn flatField(lhs: anytype, comptime name: []const u8) FlatFieldReturnType(@TypeOf(lhs), name) {
    // will search only one level deep
    const info = flatFieldSetup(@TypeOf(lhs), name);
    if (info.access_through) |access_through| {
        return &@field(@field(lhs, access_through), name);
    } else {
        return &@field(lhs, name);
    }
}

fn FlatFieldReturnType(comptime LHS: type, comptime name: []const u8) type {
    comptime {
        const info = flatFieldSetup(LHS, name);
        const T = info.return_type;
        if (!info.is_ptr or (info.is_const_ptr orelse false)) {
            // lhs is by-const-ptr or by-value
            return *const T;
        } else {
            // lhs is by-mut-ptr
            return *T;
        }
    }
}

fn flatFieldSetup(comptime LHS: type, comptime name: []const u8) struct {
    is_ptr: bool,
    is_const_ptr: ?bool,
    access_through: ?[]const u8,
    return_type: type,
} {
    comptime {
        const is_ptr = meta.trait.is(.Pointer)(LHS);
        if (is_ptr and !meta.trait.isSingleItemPtr(LHS)) {
            @compileError("Excepted single item pointer to struct or union type, found '" ++ @typeName(LHS) ++ "'");
        }

        //const is_const_ptr = meta.trait.isConstPtr(LHS);
        const T = if (is_ptr)
            @typeInfo(LHS).Pointer.child
        else
            LHS;

        const is_struct = meta.trait.is(.Struct)(T);
        const is_union = meta.trait.is(.Union)(T);
        if (!is_struct and !is_union) {
            @compileError("Expected struct or union type, found '" ++ @typeName(T) ++ "'");
        }

        if (@hasField(T, name)) {
            // found in the struct/union itself
            return .{
                .is_ptr = is_ptr,
                .is_const_ptr = if (is_ptr) meta.trait.isConstPtr(LHS) else null,
                .access_through = null,
                .return_type = meta.fieldInfo(T, name).field_type,
            };
        }

        const fields = switch (@typeInfo(T)) {
            .Struct => |info| info.fields,
            .Union => |info| info.fields,
            else => unreachable,
        };
        for (fields) |field| {
            // only check fields marked as 'unnamed'
            //if (!std.mem.endsWith(u8, field.name, "_unnamed")) continue;
            const F = field.field_type;
            const check_field = meta.trait.is(.Struct)(F) or meta.trait.is(.Union)(F);
            if (check_field and @hasField(F, name)) {
                // found in an embedded struct/union
                return .{
                    .is_ptr = is_ptr,
                    .is_const_ptr = if (is_ptr) meta.trait.isConstPtr(LHS) else null,
                    .access_through = field.name,
                    .return_type = meta.fieldInfo(F, name).field_type,
                };
                // TODO check for ambiguity 
            }
        }

        @compileError("No field named '" ++ name ++ "' in '" ++ @typeName(T) ++ "'");
    }
}

jorangreef commented 4 years ago

Thanks @Rocknest! That definitely works, but requires function getters/setters. If we use direct field access named after the first member of a union then that would also be future-proof if/when support for anonymous unions lands in Zig, i.e. everything will just work without any code needing to be upgraded.

Rocknest commented 4 years ago

@jorangreef i would not expect support for this in zig because this introduces a bit of indirection "hidden memory-control flow" (you have to read carefully if some field is compatible with another etc., giving more possibilities for an unsafe behaviour), it merges namespaces (its essentially usingnamespace on fields) and that feature was considered for removal at one point.

So its better to have getters (in status quo zig you can even have this syntax ring.field(.off).* = x;), though one nice thing would be an ability to create getters that act like @field builtin - which can be an L-value

jorangreef commented 4 years ago

@Rocknest are you referring to the proposal in #985?

Rocknest commented 4 years ago

Not necessarily that proposal, i mean an ability to have fields that depend on some state (or may point to the same memory, not counting sub-byte fields).

jorangreef commented 4 years ago

The memory stays u32s or u64s since the kernel already keeps the unions consistent, i.e. almost all flags are u32s, just with a different name, and if the kernel adds something exotic (e.g. poll_flags) we can always force layouts and add tests until #985 lands.

I think this approach is the right stop gap until then, whereas getters/setters would have to be reworked later, breaking user code.

Rocknest commented 4 years ago

I know that its technically safe. However memory aliasing is not logically safe and im not a fan of that. How annon struct/union would provide runtime safety? I see a clear way to implement it in a getter

if (runtime_safety) {
  if (!isActiveField(opcode, name)) @panic("field is not active!");
}

SpexGuy commented 3 years ago

985 has been partially accepted, which should solve this use case. We did consider the alternative suggested in comments on this issue of using getter functions, but decided not to do that. Getter functions have similar drawbacks to operator overloading. They take a simple operation that has no side effects, and turn it into something that may be arbitrarily complex. So we've accepted the ability to change between struct and union layouts within the field list of an extern union or struct.

SpexGuy commented 3 years ago

Thanks for providing this real world use case! It was instrumental in making the decision on #985. Since that feature should solve this case, I'm going to close this issue. Feel free to reopen if you think it doesn't fully solve the problem.

jorangreef commented 3 years ago

Thanks @SpexGuy that's great to hear.

marler8997 commented 3 years ago

This needs to be reopened now correct?

jsitnicki commented 1 year ago

When it comes to real world use cases - another part Linux API in wide-spread use where unnamed fields show up is the BPF API.

A key data type there, without which you can't really do anything, is bpf_attr. As opposed to the io_uring example from @jorangreef, in this scenario we have a union with unnamed structs:

union bpf_attr {
    struct { /* anonymous struct used by BPF_MAP_CREATE command */
        __u32   map_type;   /* one of enum bpf_map_type */
        /* … */
    };

    struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
        __u32       map_fd;
        /* … */
    };

    /* … */
} __attribute__((aligned(8)));

On a side note - I realize that we have already excellent bindings for BPF in the std lib. However, with Linux API constantly evolving, we are always playing catch-up here. Having @cInclude as a fallback for those times when there is this "one new field" not yet in the bindings is what makes Zig stand out, IMHO.

Currently, in my mind, the translation from C is lossy. We lose the information which structs/unions were unnamed in the C type. As a result, I cannot access fields from unnamed structs/unions without looking at the translated cimport.zig.

That's a bummer. I would really like to help make this UX better in any way needed.

Is there a consensus on what is the path forward? Is it the alias approach proposed in #7698? Or do we need to think of something completely different?

Zig interop with C is already the best in class! It makes easy things easy. Now we just need to make the hard things possible as well.

jsitnicki commented 2 months ago

A candidate to add the issue list for the "Linux kernel module" project? https://github.com/ziglang/zig/projects/5#card-31077192

ziglang / zig

improve unions for c interop #6349