Closed andrewrk closed 2 months ago
What kernel version did this run on ? I briefly tested on a gentoo vm with kernels 5.4/5.10/5.11/5.12/5.13/5.14/5.15 and I can't reproduce it.
Linux zinc-amd64 5.14.14-arch1-1 #1 SMP PREEMPT Wed, 20 Oct 2021 21:35:18 +0000 x86_64 GNU/Linux
it was this CI run
so without the stacktrace I can't be sure it's the same crash but I managed to find a panic if I run the code in a loop, here's the reproducer:
const std = @import("std");
const os = std.os;
const testing = std.testing;
pub fn main() anyerror!void {
var ring = try os.linux.IO_Uring.init(2, 0);
defer ring.deinit();
var raw_buffers: [32][1024]u8 = undefined;
// First buffer will be written to the file.
std.mem.set(u8, &raw_buffers[0], 'z');
std.mem.copy(u8, &raw_buffers[0], "foobar");
var buffers = [2]os.iovec{
.{ .iov_base = &raw_buffers[0], .iov_len = raw_buffers[0].len },
.{ .iov_base = &raw_buffers[1], .iov_len = raw_buffers[1].len },
};
try ring.register_buffers(&buffers);
while (true) {
const path = "test_io_uring_write_read_fixed";
const file = try std.fs.cwd().createFile(path, .{ .read = true, .truncate = true });
defer file.close();
defer std.fs.cwd().deleteFile(path) catch {};
const fd = file.handle;
const sqe_write = try ring.write_fixed(0x45454545, fd, &buffers[0], 3, 0);
sqe_write.flags |= os.linux.IOSQE_IO_LINK;
_ = try ring.read_fixed(0x12121212, fd, &buffers[1], 0, 1);
try testing.expectEqual(@as(u32, 2), try ring.submit());
_ = try ring.copy_cqe();
_ = try ring.copy_cqe();
}
}
running that on my machine it almost always crashes within a couple of seconds:
thread 971025 panic: reached unreachable code
/home/vincent/dev/devtools/zig/lib/std/debug.zig:224:14: 0x2069eb in std.debug.assert (zig-debug-io_uring-readwrite-fixed)
if (!ok) unreachable; // assertion failure
^
/home/vincent/dev/devtools/zig/lib/std/os/linux/io_uring.zig:304:15: 0x2379ee in std.os.linux.io_uring.IO_Uring.copy_cqe (zig-debug-io_uring-readwrite-fixed)
assert(count == 1);
^
/home/vincent/tmp/zig-debug-io_uring-readwrite-fixed/src/main.zig:44:43: 0x22f4ab in main (zig-debug-io_uring-readwrite-fixed)
const cqe_read = try ring.copy_cqe();
^
/home/vincent/dev/devtools/zig/lib/std/start.zig:553:37: 0x2277aa in std.start.callMain (zig-debug-io_uring-readwrite-fixed)
const result = root.main() catch |err| {
^
/home/vincent/dev/devtools/zig/lib/std/start.zig:495:12: 0x208b7e in std.start.callMainWithArgs (zig-debug-io_uring-readwrite-fixed)
return @call(.{ .modifier = .always_inline }, callMain, .{});
^
/home/vincent/dev/devtools/zig/lib/std/start.zig:409:17: 0x207c16 in std.start.posixCallMainAndExit (zig-debug-io_uring-readwrite-fixed)
std.os.exit(@call(.{ .modifier = .always_inline }, callMainWithArgs, .{ argc, argv, envp }));
^
/home/vincent/dev/devtools/zig/lib/std/start.zig:322:5: 0x207a22 in std.start._start (zig-debug-io_uring-readwrite-fixed)
@call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{});
^
count
is 0 when it crashes and I don't understand why yet.
So I think copy_cqe
is wrong: it assumes that this call to copy_cqes will always return 1 but that's not the case, apparently even after calling io_uring_enter there may not be any CQE ready and copy_cqes_ready
will return 0.
@jorangreef please correct me if any of this is wrong.
Looking at liburing it looks like they loop until they get a CQE or an error. Something like this fixes the crash above as far as I can tell:
diff --git a/lib/std/os/linux/io_uring.zig b/lib/std/os/linux/io_uring.zig
index 9dd5aae3e..fc32508cf 100644
--- a/lib/std/os/linux/io_uring.zig
+++ b/lib/std/os/linux/io_uring.zig
@@ -300,9 +300,10 @@ pub const IO_Uring = struct {
/// A convenience method for `copy_cqes()` for when you don't need to batch or peek.
pub fn copy_cqe(ring: *IO_Uring) !io_uring_cqe {
var cqes: [1]io_uring_cqe = undefined;
- const count = try ring.copy_cqes(&cqes, 1);
- assert(count == 1);
- return cqes[0];
+ while (true) {
+ const count = try ring.copy_cqes(&cqes, 1);
+ if (count > 0) return cqes[0];
+ }
}
/// Matches the implementation of cq_ring_needs_flush() in liburing.
But I'm not confident this is the right way to fix this.
Oops
I think that this should be closed.
Since @vrischmann's fix in copy_cqe
test is running fine, not flaky, and not disabled all this time.
As @vrischmann explained, according to the man io_uring_enter
syscall when called with min_complete
should return when that number of cqe's are ready. Here we found that there are rare cases when that call returns before min_complete cqe's are ready (found a similar discussion in liburing).
On the newer kernel (6.10) I could reproduce that when signal arrives during io_uring_enter
waiting for more cqe's. In that case, it returns with fewer than min_complete without setting errno.
I think we have a fine fix and there is nothing more to do about this.
Zig Version
0.10.0-dev.97+2f53406ad
Steps to Reproduce
Expected Behavior
All std lib tests passed.
Actual Behavior