linux-test-project / ltp

Linux Test Project (mailing list: https://lists.linux.it/listinfo/ltp)
https://linux-test-project.readthedocs.io/
GNU General Public License v2.0
2.28k stars 1k forks source link

io_uring01: random fails with 5.10.104 kernel on slow ARM device #924

Open multiple1 opened 2 years ago

multiple1 commented 2 years ago

Hello!

Setup: Kernel 5.10.104 preempt_rt on slow ARM926EJ-S rev 5 (v5l) platform io_uring01 fails most of the time but sometimes it passes.

TFAIL: io_uring_enter() returned 0: EAGAIN/EWOULDBLOCK (11) https://github.com/linux-test-project/ltp/blob/20210121/testcases/kernel/syscalls/io_uring/io_uring01.c#L221

I already tried to narrow down the issue via multiple retries in case of ret == 0:

    tail = next_tail;

    /* Kernel to notice the tail update */
    if (*sring->tail != tail)
        *sring->tail = tail;

    // ######################################
    // with sleep the test passes every time
    //usleep(1000);

    unsigned const max_tries = 10;
    unsigned try = 0;
    do{
        try++;
        ret = io_uring_enter(s->ring_fd, 1 , 1, IORING_ENTER_GETEVENTS, &sig);
        tst_res(TINFO, "io_uring_enter() try %u", try);
                usleep(1000);
    } while((try < max_tries) && (ret == 0));

    if (ret == 1) {
        tst_res(TPASS, "io_uring_enter() waited for 1 event");
    } else {
        tst_res(TFAIL | TERRNO, "io_uring_enter() returned %i", ret);
        SAFE_CLOSE(fd);
        return 1;
    }

    SAFE_CLOSE(fd);
    return 0;
}

With this addition the test will also fail most times but some time it passes e.g. on the 3 try. But if add a sleep after Kernel to notice the tail update the test will always pass on the first try.

Some synchronization missing in the test? e.g. https://github.com/torvalds/linux/blob/master/tools/io_uring/queue.c#L29

metan-ucw commented 2 years ago

I guess that we need a write barrier after the tail update. Does following patch fix the failures for you?

diff --git a/testcases/kernel/syscalls/io_uring/io_uring01.c b/testcases/kernel/syscalls/io_uring/io_uring01.c
index 70151bb85..c324455aa 100644
--- a/testcases/kernel/syscalls/io_uring/io_uring01.c
+++ b/testcases/kernel/syscalls/io_uring/io_uring01.c
@@ -214,6 +214,8 @@ static int submit_to_uring_sq(struct submitter *s, struct tcase *tc)
        if (*sring->tail != tail)
                *sring->tail = tail;

+       __sync_synchronize();
+
        ret = io_uring_enter(s->ring_fd, 1, 1, IORING_ENTER_GETEVENTS, &sig);
        if (ret == 1) {
                tst_res(TPASS, "io_uring_enter() waited for 1 event");
multiple1 commented 2 years ago

Adding synchronize did not change the behavior.