tokio-rs / io-uring

The `io_uring` library for Rust
Apache License 2.0
1.16k stars 129 forks source link

CQ overflowing, but ring is never busy #302

Open lbrndnr opened 3 weeks ago

lbrndnr commented 3 weeks ago

Hi,

I'm running into an issue, and I'm not entirely sure if I'm missing something or if this is a bug. Let me explain. So I'm running the tcp_echo example and hit it with a heavy workload. After reaching a certain load (20k rps, 3000 simultaneous conns), I see that some of the connections are timing out, thus are not written to by the echo binary anymore.

My first guess was that the completion queue is overflowing and dropping some events, such that the connections will never be read or written to anymore. I checked this with sq.cq_overflow() and cq.overflow() > 0, and indeed the first assertion returns true under the heavy workload (the second is never true, indicating that nothing is dropped??). Increasing the CQ size fixes the issue for this particular load.

According to my understanding, if ring.params().is_feature_nodrop() is true (which it is on my machine), io_uring will wait for me to reap the CQEs and return EBUSY if I try to submit new SQEs. But this never happens, i.e. submitter.submit() always returns Ok. Am I missing something? Or how can I make sure that the completion queue is never overflowing?

Thanks for your help!

lbrndnr commented 3 weeks ago

Ok according to this commit it seems like io_uring_enter stopped returning -EBUSY since kernel v5.13. But that doesn't explain why increasing the CQ size helps, when nothing is supposed to get dropped...