socketry / nio4r

Cross-platform asynchronous I/O primitives for scalable network clients and servers.
Other
967 stars 86 forks source link

2.5.5 causes Assertion failed in libev. #266

Closed joshuapinter closed 3 years ago

joshuapinter commented 3 years ago

Upgrading nio4r from 2.5.4 to 2.5.5 causes the following error when doing requests:

Assertion failed: (("libev: poll found invalid fd in poll set", 0)), function poll_poll, file ./../libev/ev_poll.c, line 121.

This came out of the blue as we were updating a different gem and doing a bundle update automatically updated nio4r from 2.5.4 to 2.5.5 and then all of a sudden our development web server stopped working. We went down a rabbit hole assuming the gem we were actually updating was the issue, but it ended up being this patch update in nio4r that was the culprit.

We're going to pin our version to 2.5.4 for now but let me know what else you need from me to diagnose this.

Thanks.

Versions

MacBook Pro 15" Intel ruby 2.7.2p137 rails 5.2.4.4 puma 5.0.2 macOS 11.2 Big Sur

ioquatix commented 3 years ago

We updated libev but that should be fairly stable.

It would be helpful to know some more context about the issue - i.e. why did it happen?

joshuapinter commented 3 years ago

Thanks for the quick response, @ioquatix.

This is happening with just a Rails web request hit. When bumping the version of nio4r from 2.5.4 to 2.5.5 (and nothing else), I see the above error message in the Rails server logs and the page fails to load.

Let me know what else I can do to help you diagnose.

Thanks.

jcmfernandes commented 3 years ago

@joshuapinter can you please tell me the output of running

NIO::Selector.backends
NIO::Selector.new.backend

with both nio4r 2.5.4 and 2.5.5? Thanks!

smridge commented 3 years ago

This update consistently broke a build for me as well. I don't know enough about this gem and how it is used to further debug, other than it is a dependency of rails ActionCable for how I'm using it.

compiling nio4r_ext.c
In file included from nio4r_ext.c:6:
../libev/ev.c:479:11: fatal error: linux/version.h: No such file or directory
  479 | # include <linux/version.h>
      |           ^~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:243: nio4r_ext.o] Error 1
jcmfernandes commented 3 years ago

Can you please tell me more about your system @smridge? OS, kernel version, architecture, C compiler... Thank you!

jcmfernandes commented 3 years ago

@smridge I believe I know the problem, but it has nothing to do with what was originally reported in this thread. Please install the headers for your kernel and try to compile the native extensions again.

smridge commented 3 years ago

Ahh, you're right! Thank you @jcmfernandes !

ghost commented 3 years ago

I am seeing this as well with my rails app. Looking forward to the next release!

jcmfernandes commented 3 years ago

@alan-pie by "this" do you mean the original report? If so, can you please tell me more about your system and provide me with the output of running

NIO::Selector.backends
NIO::Selector.new.backend

Thanks!

ghost commented 3 years ago

Yeah, the original report and the problem mentioned in the linked PR.

macOS 10.14.6 running a Ruby on Rails test

On 2.5.4:

(byebug) NIO::Selector.backends
[:poll, :kqueue, :select]
(byebug) NIO::Selector.new.backend
:kqueue

On 2.5.5

(byebug) NIO::Selector.backends
[:poll, :kqueue, :select]
(byebug) NIO::Selector.new.backend
:poll
jcmfernandes commented 3 years ago

Thanks @alan-pie! And you're running into the

Assertion failed: (("libev: poll found invalid fd in poll set", 0)), function poll_poll, file ./../libev/ev_poll.c, line 121.

running a rails test? Can you give me some insight into that test?

tarcieri commented 3 years ago

This was already fixed in #268. It seems that @ioquatix just needs to cut a 2.5.6 release.

In the meantime you can try pulling in this repo from git and testing HEAD to confirm it's fixed.

ghost commented 3 years ago

Yeah, it's a Capybara test using Selenium Webdriver to drive a headless Chrome instance. The test runs our rails app and opens and clicks around some webpages. It seems to fail on the first call to visit '/users/sign_in' which triggers the load of Puma/the browser. A blind guess would be something around a socket to drive the Chrome instance.

I'll try pulling from HEAD and see if it's fixed

ghost commented 3 years ago

Confirmed, it's fixed for me at HEAD

jcmfernandes commented 3 years ago

That's great, thanks @alan-pie!

@ioquatix knowing this might justify a new release :wink: .

EDIT: and after that we can close this issue.

santib commented 3 years ago

+1 I'm waiting for the 2.5.6 release as well 😄

jcmfernandes commented 3 years ago

Sorry about the pain everyone, my bad. But well, I broke it, I fixed it! :sweat_smile:

jcmfernandes commented 3 years ago

While 2.5.6 doesn't come out, an easy workaround is setting environment variable LIBEV_FLAGS to 8. It will make libev chose kqueue.

Another alternative for those using puma: starting with version 5.2.0 you can specific the IO selector (https://github.com/puma/puma/pull/2522), so adding

io_selector_backend :kqueue

to your puma configuration will also do the trick :wink: .

ioquatix commented 3 years ago

I'll make a release today.

ioquatix commented 3 years ago

Okay, v2.5.6 is released. Please open a new issue if there are ongoing related issues.

ghost commented 3 years ago

Amazing turn-around! Thanks!