netty / netty-incubator-transport-io_uring

Apache License 2.0
193 stars 38 forks source link

WIP: repro IOUringEventLoop shutdown bug #226

Closed bryce-anderson closed 11 months ago

bryce-anderson commented 11 months ago

I'm trying to debug a problem where I can't close down an IOUring event loop which ends up stuck in the ioUringWaitCqe state. This seems to only happen if the quiet period and graceful shutdown window are 0, or a very small duration.

franz1981 commented 11 months ago

Hi! Try checking the sequences on https://github.com/netty/netty-incubator-transport-io_uring/pull/193 explained in https://github.com/netty/netty-incubator-transport-io_uring/issues/192 ( exactly at https://github.com/netty/netty-incubator-transport-io_uring/issues/192#issuecomment-1435012607)

I have basically logged all the sqe events and cqe events + dump the threads (as you have done already) to solve it

bryce-anderson commented 11 months ago

Hi @franz1981! I think I have it sorted out via https://github.com/netty/netty-incubator-transport-io_uring/pull/227. The logging showed that for stalled loops we never enter the .submitAndWait() call so we never flushed the initial interest in the eventFd.

bryce-anderson commented 11 months ago

The latest commit shows how the sequence can happen deterministically but it requires modifications to the IOUringEventLoop that clearly wouldn't be acceptable to merge.