ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.45k stars 2.52k forks source link

flaky io_uring test: write_fixed/read_fixed #10462

Closed andrewrk closed 2 months ago

andrewrk commented 2 years ago

Zig Version

0.10.0-dev.97+2f53406ad

Steps to Reproduce

/workspace/_debug/staging/bin/zig test /workspace/lib/std/std.zig --test-name-prefix std-native-ReleaseSafe-bare-multi  -OReleaseSafe --cache-dir /workspace/zig-cache --global-cache-dir /root/.cache/zig --name test -fno-single-threaded -I /workspace/test --zig-lib-dir /workspace/lib 

Expected Behavior

All std lib tests passed.

Actual Behavior

273/2095 os.linux.io_uring.test "std-native-ReleaseSafe-bare-multi write_fixed/read_fixed"... thread 52953 panic: reached unreachable code
vrischmann commented 2 years ago

What kernel version did this run on ? I briefly tested on a gentoo vm with kernels 5.4/5.10/5.11/5.12/5.13/5.14/5.15 and I can't reproduce it.

andrewrk commented 2 years ago

Linux zinc-amd64 5.14.14-arch1-1 #1 SMP PREEMPT Wed, 20 Oct 2021 21:35:18 +0000 x86_64 GNU/Linux

it was this CI run

vrischmann commented 2 years ago

so without the stacktrace I can't be sure it's the same crash but I managed to find a panic if I run the code in a loop, here's the reproducer:

const std = @import("std");
const os = std.os;
const testing = std.testing;

pub fn main() anyerror!void {
    var ring = try os.linux.IO_Uring.init(2, 0);
    defer ring.deinit();

    var raw_buffers: [32][1024]u8 = undefined;
    // First buffer will be written to the file.
    std.mem.set(u8, &raw_buffers[0], 'z');
    std.mem.copy(u8, &raw_buffers[0], "foobar");

    var buffers = [2]os.iovec{
        .{ .iov_base = &raw_buffers[0], .iov_len = raw_buffers[0].len },
        .{ .iov_base = &raw_buffers[1], .iov_len = raw_buffers[1].len },
    };
    try ring.register_buffers(&buffers);

    while (true) {
        const path = "test_io_uring_write_read_fixed";
        const file = try std.fs.cwd().createFile(path, .{ .read = true, .truncate = true });
        defer file.close();
        defer std.fs.cwd().deleteFile(path) catch {};

        const fd = file.handle;

        const sqe_write = try ring.write_fixed(0x45454545, fd, &buffers[0], 3, 0);
        sqe_write.flags |= os.linux.IOSQE_IO_LINK;

        _ = try ring.read_fixed(0x12121212, fd, &buffers[1], 0, 1);
        try testing.expectEqual(@as(u32, 2), try ring.submit());

        _ = try ring.copy_cqe();
        _ = try ring.copy_cqe();
    }
}

running that on my machine it almost always crashes within a couple of seconds:

thread 971025 panic: reached unreachable code
/home/vincent/dev/devtools/zig/lib/std/debug.zig:224:14: 0x2069eb in std.debug.assert (zig-debug-io_uring-readwrite-fixed)
    if (!ok) unreachable; // assertion failure
             ^
/home/vincent/dev/devtools/zig/lib/std/os/linux/io_uring.zig:304:15: 0x2379ee in std.os.linux.io_uring.IO_Uring.copy_cqe (zig-debug-io_uring-readwrite-fixed)
        assert(count == 1);
              ^
/home/vincent/tmp/zig-debug-io_uring-readwrite-fixed/src/main.zig:44:43: 0x22f4ab in main (zig-debug-io_uring-readwrite-fixed)
        const cqe_read = try ring.copy_cqe();
                                          ^
/home/vincent/dev/devtools/zig/lib/std/start.zig:553:37: 0x2277aa in std.start.callMain (zig-debug-io_uring-readwrite-fixed)
            const result = root.main() catch |err| {
                                    ^
/home/vincent/dev/devtools/zig/lib/std/start.zig:495:12: 0x208b7e in std.start.callMainWithArgs (zig-debug-io_uring-readwrite-fixed)
    return @call(.{ .modifier = .always_inline }, callMain, .{});
           ^
/home/vincent/dev/devtools/zig/lib/std/start.zig:409:17: 0x207c16 in std.start.posixCallMainAndExit (zig-debug-io_uring-readwrite-fixed)
    std.os.exit(@call(.{ .modifier = .always_inline }, callMainWithArgs, .{ argc, argv, envp }));
                ^
/home/vincent/dev/devtools/zig/lib/std/start.zig:322:5: 0x207a22 in std.start._start (zig-debug-io_uring-readwrite-fixed)
    @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{});
    ^

count is 0 when it crashes and I don't understand why yet.

vrischmann commented 2 years ago

So I think copy_cqe is wrong: it assumes that this call to copy_cqes will always return 1 but that's not the case, apparently even after calling io_uring_enter there may not be any CQE ready and copy_cqes_ready will return 0. @jorangreef please correct me if any of this is wrong.

Looking at liburing it looks like they loop until they get a CQE or an error. Something like this fixes the crash above as far as I can tell:

diff --git a/lib/std/os/linux/io_uring.zig b/lib/std/os/linux/io_uring.zig
index 9dd5aae3e..fc32508cf 100644
--- a/lib/std/os/linux/io_uring.zig
+++ b/lib/std/os/linux/io_uring.zig
@@ -300,9 +300,10 @@ pub const IO_Uring = struct {
     /// A convenience method for `copy_cqes()` for when you don't need to batch or peek.
     pub fn copy_cqe(ring: *IO_Uring) !io_uring_cqe {
         var cqes: [1]io_uring_cqe = undefined;
-        const count = try ring.copy_cqes(&cqes, 1);
-        assert(count == 1);
-        return cqes[0];
+        while (true) {
+            const count = try ring.copy_cqes(&cqes, 1);
+            if (count > 0) return cqes[0];
+        }
     }

     /// Matches the implementation of cq_ring_needs_flush() in liburing.

But I'm not confident this is the right way to fix this.

andrewrk commented 2 months ago

Oops

ianic commented 2 months ago

I think that this should be closed.

Since @vrischmann's fix in copy_cqe test is running fine, not flaky, and not disabled all this time.

As @vrischmann explained, according to the man io_uring_enter syscall when called with min_complete should return when that number of cqe's are ready. Here we found that there are rare cases when that call returns before min_complete cqe's are ready (found a similar discussion in liburing). On the newer kernel (6.10) I could reproduce that when signal arrives during io_uring_enter waiting for more cqe's. In that case, it returns with fewer than min_complete without setting errno.

I think we have a fine fix and there is nothing more to do about this.