Open jorangreef opened 4 years ago
Imho the best solution is this: Don't use translate-c
but hand-translate those unions. Then nothing will break when the source material changes in a backwards compatible way.
translate-c
is nice, but it's way better to wrap C APIs in pure Zig, replacing all [*c]
with the corresponding correct pointer type and such
Thanks @MasterQ32 , we are not using translate-c
here at all. I am actually rewriting io_uring
helpers for Zig from scratch, and simply referencing liburing's interface design decisions so that these helpers make sense for anyone who is already comfortable with liburing's documentation. This is comparing Zig's std lib definition of io_uring_sqe
with that of the kernel.
The problem is that Zig's unions force you to access union fields by going through the name of the union.
In other words, whenever the kernel adds new io_uring
features by converting existing u32 or u64 fields into unions, then adding the same unions to Zig's std lib definition breaks all existing Zig code using the linux.io_uring_sqe
definition in the Zig std lib and accessing those u32 or u64 fields directly.
Does that make sense?
This should be covered by #985 and #1214.
Thanks @vexu!
@jorangreef i've made a function that may solve your use-case:
const io_uring_sqe = extern struct {
opcode: u8,
flags: u8,
ioprio: u16,
fd: i32,
union1_unnamed: extern union {
off: u64,
addr2: u64,
},
// etc
};
export fn _off(x: *const io_uring_sqe) u64 {
return x.union1_unnamed.off;
}
export fn _off_2(x: *const io_uring_sqe) u64 {
return flatField(x, "off").*;
}
export fn _fd(x: *io_uring_sqe, y: i32) void {
x.fd = y;
}
export fn _fd_2(x: *io_uring_sqe, y: i32) void {
flatField(x, "fd").* = y;
}
Source:
const std = @import("std");
const meta = std.meta;
fn flatField(lhs: anytype, comptime name: []const u8) FlatFieldReturnType(@TypeOf(lhs), name) {
// will search only one level deep
const info = flatFieldSetup(@TypeOf(lhs), name);
if (info.access_through) |access_through| {
return &@field(@field(lhs, access_through), name);
} else {
return &@field(lhs, name);
}
}
fn FlatFieldReturnType(comptime LHS: type, comptime name: []const u8) type {
comptime {
const info = flatFieldSetup(LHS, name);
const T = info.return_type;
if (!info.is_ptr or (info.is_const_ptr orelse false)) {
// lhs is by-const-ptr or by-value
return *const T;
} else {
// lhs is by-mut-ptr
return *T;
}
}
}
fn flatFieldSetup(comptime LHS: type, comptime name: []const u8) struct {
is_ptr: bool,
is_const_ptr: ?bool,
access_through: ?[]const u8,
return_type: type,
} {
comptime {
const is_ptr = meta.trait.is(.Pointer)(LHS);
if (is_ptr and !meta.trait.isSingleItemPtr(LHS)) {
@compileError("Excepted single item pointer to struct or union type, found '" ++ @typeName(LHS) ++ "'");
}
//const is_const_ptr = meta.trait.isConstPtr(LHS);
const T = if (is_ptr)
@typeInfo(LHS).Pointer.child
else
LHS;
const is_struct = meta.trait.is(.Struct)(T);
const is_union = meta.trait.is(.Union)(T);
if (!is_struct and !is_union) {
@compileError("Expected struct or union type, found '" ++ @typeName(T) ++ "'");
}
if (@hasField(T, name)) {
// found in the struct/union itself
return .{
.is_ptr = is_ptr,
.is_const_ptr = if (is_ptr) meta.trait.isConstPtr(LHS) else null,
.access_through = null,
.return_type = meta.fieldInfo(T, name).field_type,
};
}
const fields = switch (@typeInfo(T)) {
.Struct => |info| info.fields,
.Union => |info| info.fields,
else => unreachable,
};
for (fields) |field| {
// only check fields marked as 'unnamed'
//if (!std.mem.endsWith(u8, field.name, "_unnamed")) continue;
const F = field.field_type;
const check_field = meta.trait.is(.Struct)(F) or meta.trait.is(.Union)(F);
if (check_field and @hasField(F, name)) {
// found in an embedded struct/union
return .{
.is_ptr = is_ptr,
.is_const_ptr = if (is_ptr) meta.trait.isConstPtr(LHS) else null,
.access_through = field.name,
.return_type = meta.fieldInfo(F, name).field_type,
};
// TODO check for ambiguity
}
}
@compileError("No field named '" ++ name ++ "' in '" ++ @typeName(T) ++ "'");
}
}
Thanks @Rocknest! That definitely works, but requires function getters/setters. If we use direct field access named after the first member of a union then that would also be future-proof if/when support for anonymous unions lands in Zig, i.e. everything will just work without any code needing to be upgraded.
@jorangreef i would not expect support for this in zig because this introduces a bit of indirection "hidden memory-control flow" (you have to read carefully if some field is compatible with another etc., giving more possibilities for an unsafe behaviour), it merges namespaces (its essentially usingnamespace on fields) and that feature was considered for removal at one point.
So its better to have getters (in status quo zig you can even have this syntax ring.field(.off).* = x;
), though one nice thing would be an ability to create getters that act like @field
builtin - which can be an L-value
@Rocknest are you referring to the proposal in #985?
Not necessarily that proposal, i mean an ability to have fields that depend on some state (or may point to the same memory, not counting sub-byte fields).
The memory stays u32s or u64s since the kernel already keeps the unions consistent, i.e. almost all flags are u32s, just with a different name, and if the kernel adds something exotic (e.g. poll_flags) we can always force layouts and add tests until #985 lands.
I think this approach is the right stop gap until then, whereas getters/setters would have to be reworked later, breaking user code.
I know that its technically safe. However memory aliasing is not logically safe and im not a fan of that. How annon struct/union would provide runtime safety? I see a clear way to implement it in a getter
if (runtime_safety) {
if (!isActiveField(opcode, name)) @panic("field is not active!");
}
Thanks for providing this real world use case! It was instrumental in making the decision on #985. Since that feature should solve this case, I'm going to close this issue. Feel free to reopen if you think it doesn't fully solve the problem.
Thanks @SpexGuy that's great to hear.
This needs to be reopened now correct?
When it comes to real world use cases - another part Linux API in wide-spread use where unnamed fields show up is the BPF API.
A key data type there, without which you can't really do anything, is bpf_attr
.
As opposed to the io_uring
example from @jorangreef, in this scenario we have a union with unnamed structs:
union bpf_attr {
struct { /* anonymous struct used by BPF_MAP_CREATE command */
__u32 map_type; /* one of enum bpf_map_type */
/* … */
};
struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
__u32 map_fd;
/* … */
};
/* … */
} __attribute__((aligned(8)));
On a side note - I realize that we have already excellent bindings for BPF in the std lib. However, with Linux API constantly evolving, we are always playing catch-up here. Having @cInclude
as a fallback for those times when there is this "one new field" not yet in the bindings is what makes Zig stand out, IMHO.
Currently, in my mind, the translation from C is lossy. We lose the information which structs/unions were unnamed in the C type. As a result, I cannot access fields from unnamed structs/unions without looking at the translated cimport.zig
.
That's a bummer. I would really like to help make this UX better in any way needed.
Is there a consensus on what is the path forward?
Is it the alias
approach proposed in #7698?
Or do we need to think of something completely different?
Zig interop with C is already the best in class! It makes easy things easy. Now we just need to make the hard things possible as well.
A candidate to add the issue list for the "Linux kernel module" project? https://github.com/ziglang/zig/projects/5#card-31077192
While working on a port of liburing to Zig, I found some rough edges with Zig's unions. I believe these are real world examples of C interop where Zig's unions could be improved.
Consider the
io_uring_sqe
struct fromio_uring
:C allows you to access a union field without knowing about the union:
In fact, the original definition of
io_uring_sqe
hadoff
as a u64, without any union. Any C code that assigned tooff
directly just worked when the kernel later introduced theoff
union. However, I think Zig forces you to go through the union:I don't know if there's a design decision for this, but it has some unpleasant consequences:
Whenever the kernel converts
io_uring_sqe
fields into unions in future, this breaks all Zig code, which will need to be updated to go through the union. C doesn't have this problem since C's unions make it possible for code like this to be future-proof.Nested unions like this become more and more unwieldy to work with compared to C.
We have to think more about naming unions, whereas these unions are anonymous in C.
To illustrate these warts, consider Zig's definition of
io_uring_sqe
:Comparing the structs closely, you can see we already have a problem, because the kernel has since introduced new unions, which Zig is yet to match, i.e.
addr
(to addsplice_off_in
) andunion3.struct1.buf_index
(to addbuf_group
). I believe any and all Zig code using the existingaddr
andbuf_index
fields would need to be updated to go through the respective new unions, whenever Zig's std lib struct definition ofio_uring_sqe
is updated.Furthermore, since the kernel interposed the new
addr
union between ourunion1
and ourunion2
unions, we now need to rename and renumber our unions (breaking more code) or come up with a better naming scheme.For now, if Zig unions stay the same, then I think something like the
io_uring_sqe
struct would be better served by not using unions in the Zig version, but rather using the original u32s or u64s, since these are always future-proof.But hopefully someone can think of how we can improve C union interop in Zig without introducing other issues?