openbmc / dbus-sensors

D-Bus configurable sensor scanning applications
Apache License 2.0
23 stars 44 forks source link

Intermitent hangs observed with io_uring on kernel 5.4 #21

Open vsytch opened 1 year ago

vsytch commented 1 year ago

Since the upgrade to using io_uring, we've observed intermitent hangs of dbus-sensor daemons on BMCs running a 5.4 kernel. The hang always happens inside io_uring_enter() - enqueued reads never return (they actually return exactly after 5 min due to some sort of internal timeout), which causes the entire service to stall.

Upgrading the BMC kernel to 5.10 magically causes the above issue to dissapear. Scavenging io_uring lore, I found that this problem has been reported previously (see https://github.com/axboe/liburing/issues/205) against 5.4. Unfortunately, the only solution suggested was to upgrade the kernel, which is not possible for us. It is also unclear as to what kernel patches would be able to resolve the hang.

Disabling io_uring support in dbus-sensors is currently non-trivial, as it requires an API change with regards to ASIO usage. It would be great to add build option to specify which backend to use - epoll vs uring. This way it would be simpler to configure the daemons against different kernel versions.

williamspatrick commented 1 year ago

Why would we want to support a 5.4 kernel with the dbus-sensors io-uring code? There isn't any openbmc/openbmc code commit that attempts that. 5.4 hasn't been used by our tree in a few years.

vsytch commented 1 year ago

What is the minimum kernel version required for OpenBMC components then? How often can we expect a kernel version bump requirement? We would have to keep these questions in mind before pulling in obmc code.

williamspatrick commented 1 year ago

What is the minimum kernel version required for OpenBMC components then? How often can we expect a kernel version bump requirement? We would have to keep these questions in mind before pulling in obmc code.

Generally we only test systems together with whatever is in the openbmc/openbmc tree. Are you manually extracting specific openbmc components into your own Linux distribution? I don't think anyone would turn down someone contributing changes to support different configurations but we're not likely to go out of our way to enable it.

vsytch commented 1 year ago

Are you manually extracting specific openbmc components into your own Linux distribution? - essentially yes. Kernel for a specific BMC machine gets semi-frozen (until someone decides to spend time to upgrade it), but all user space components just pull in what's available in OpenBMC.