Open MBkkt opened 1 year ago
https://github.com/scylladb/seastar/blob/57acd00402ae6a2fb4eec93e81c9ef1647cc5efe/src/core/reactor_backend.cc#L1517 Probably I don't right, it looks like it needed here, but anyway avoiding copying and buffer on stack looks possible
hi @MBkkt, io_uring_for_each_cqe()
does load-acquire and store-release. probably we can ignore this performance overhead on x86 architectures, though. do you have some benchmark numbers so we can have a better understanding of how this change can help with the performance?
The reason is likely that I wasn't aware of for_each_cqe.
In fact it was only documented a few days ago: axboe/liburing@16d74b1c76043e628726bf43cd91332e262879c6.
I don't think using this macros directly is good idea for non x86: https://github.com/axboe/liburing/issues/867
I just created this issue, because I looked your code to understand better how could I work with liburing, and thought that buffer and maybe submit look unnecessary.
So I don't have any measurements and feel free to just close it :)
I don't think using this macros directly is good idea for non x86: axboe/liburing#867
i disagree. we should use this macro no matter on x86 or on non-x86. as the producer and consumer could be running on different cores. the chance of racing is rare, but it exists.
I just created this issue, because I looked your code to understand better how could I work with liburing, and thought that buffer and maybe submit look unnecessary.
So I don't have any measurements and feel free to just close it :)
i see. i'd prefer keeping it open. i will try to benchmark these two versions. as i believe that the compiler should just compile the atomic_load to a plain load instruction without adding any barriers for x86, so it is a pure win on x86. i will give it a try.
i disagree. we should use this macro no matter on x86 or on non-x86. as the producer and consumer could be running on different cores. the chance of racing is rare, but it exists.
I were not about not_atomic load instead of atomic, I just pointed that In the macros it will be done on every iteration.
Could you explain why is
io_uring_peek_batch_cqe
used here instead ofio_uring_for_each_cqe
?https://github.com/scylladb/seastar/blob/master/src/core/reactor_backend.cc#L1508
I understand that they do different things, but it looks like submit part of
io_uring_peek_batch_cqe
is unnecessary in your usage. Maybe I'm wrong, please correct me in this case. If I correct it looks likeio_uring_for_each_cqe
can be used and unnecessary copy can be avoided. WDYT?Code: