socketry / nio4r

Cross-platform asynchronous I/O primitives for scalable network clients and servers.
Other
965 stars 86 forks source link

fix for #241 #244

Closed boazsegev closed 4 years ago

boazsegev commented 4 years ago

This should fix the dangling pointer issue caused by GC.compact (#241) moving the monitor object's memory address to a new location.

ioquatix commented 4 years ago

@tenderlove can you please comment on this? Thanks.

tenderlove commented 4 years ago

This will indeed prevent the pointer at self from moving. I'm not sure why it needs to hold a reference to itself (since self should be passed in with all method calls) but I don't know this code base at all 😅

tarcieri commented 4 years ago

@tenderlove when monitoring for IO readiness (e.g. poll/epoll/kqueue), it registers a pointer to a C struct NIO_Monitor with the kernel. The kernel returns this pointer back to the C struct, and the self pointer allows it to get back to the Ruby object associated with the C struct.

ioquatix commented 4 years ago

Thanks everyone for your feedback, it looks like this should solve the problems we are facing. Thanks for @boazsegev for figuring it out, for this specific problem I didn't even know where to start.

Willianvdv commented 3 years ago

Hey folks! It seems that this PR introduced a regression on Ruby 2.6.6. When upgrading from 2.5.2 to 2.5.3, we're experiencing segfaults on our CIs.

Here's one of the backtraces we got:

/bundle/ruby/2.6.0/gems/vcr-3.0.3/lib/vcr/structs.rb:123: [BUG] Segmentation fault at 0x0000000000000080
ruby 2.6.6p146 (2020-03-31 revision 67876) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0031 p:0078 s:0176 e:000171 BLOCK  /bundle/ruby/2.6.0/gems/vcr-3.0.3/lib/vcr/structs.rb:123 [FINISH]
c:0030 p:---- s:0166 e:000165 CFUNC  :each
c:0029 p:0033 s:0162 E:000d08 METHOD /bundle/ruby/2.6.0/gems/vcr-3.0.3/lib/vcr/structs.rb:115
c:0028 p:0014 s:0157 e:000156 METHOD /bundle/ruby/2.6.0/gems/vcr-3.0.3/lib/vcr/structs.rb:106
c:0027 p:0009 s:0152 e:000151 METHOD /bundle/ruby/2.6.0/gems/vcr-3.0.3/lib/vcr/structs.rb:62
c:0026 p:0035 s:0147 e:000146 METHOD /bundle/ruby/2.6.0/gems/vcr-3.0.3/lib/vcr/structs.rb:206 [FINISH]
c:0025 p:---- s:0141 e:000140 CFUNC  :new
c:0024 p:0036 s:0133 e:000132 METHOD /bundle/ruby/2.6.0/gems/vcr-3.0.3/lib/vcr/library_hooks/webmock.rb:46
c:0023 p:0022 s:0128 e:000127 METHOD /bundle/ruby/2.6.0/gems/vcr-3.0.3/lib/vcr/library_hooks/webmock.rb:119
c:0022 p:0047 s:0124 e:000122 METHOD /bundle/ruby/2.6.0/gems/vcr-3.0.3/lib/vcr/request_handler.rb:107
c:0021 p:0007 s:0117 e:000114 METHOD /bundle/ruby/2.6.0/gems/vcr-3.0.3/lib/vcr/request_handler.rb:7
c:0020 p:0026 s:0110 e:000109 BLOCK  /bundle/ruby/2.6.0/gems/vcr-3.0.3/lib/vcr/library_hooks/webmock.rb:144
c:0019 p:0013 s:0106 e:000102 BLOCK  /bundle/ruby/2.6.0/gems/webmock-3.8.3/lib/webmock/stub_registry.rb:28
c:0018 p:0140 s:0099 e:000098 METHOD /bundle/ruby/2.6.0/gems/webmock-3.8.3/lib/webmock/request_pattern.rb:40
c:0017 p:0011 s:0093 e:000092 BLOCK  /bundle/ruby/2.6.0/gems/webmock-3.8.3/lib/webmock/stub_registry.rb:58 [FINISH]
c:0016 p:---- s:0089 e:000088 IFUNC 
c:0015 p:---- s:0086 e:000085 CFUNC  :each
c:0014 p:---- s:0083 e:000082 CFUNC  :detect
c:0013 p:0015 s:0079 e:000078 METHOD /bundle/ruby/2.6.0/gems/webmock-3.8.3/lib/webmock/stub_registry.rb:57
c:0012 p:0006 s:0074 e:000073 METHOD /bundle/ruby/2.6.0/gems/webmock-3.8.3/lib/webmock/stub_registry.rb:50
c:0011 p:0026 s:0068 e:000067 METHOD /bundle/ruby/2.6.0/gems/webmock-3.8.3/lib/webmock/http_lib_adapters/em_http_request_adapter.rb:140
c:0010 p:0012 s:0064 e:000063 METHOD /bundle/ruby/2.6.0/gems/webmock-3.8.3/lib/webmock/http_lib_adapters/em_http_request_adapter.rb:58
c:0009 p:0056 s:0057 e:000056 METHOD /bundle/ruby/2.6.0/gems/em-http-request-1.1.5/lib/em-http/http_connection.rb:92
c:0008 p:0028 s:0050 e:000049 METHOD (eval):5
c:0007 p:0193 s:0042 e:000041 METHOD /bundle/ruby/2.6.0/bundler/gems/puffing-billy-a37a97da9c18/lib/billy/handlers/proxy_handler.rb:28
c:0006 p:0021 s:0029 e:000028 BLOCK  /bundle/ruby/2.6.0/bundler/gems/puffing-billy-a37a97da9c18/lib/billy/handlers/request_handler.rb:22 [FINISH]
c:0005 p:---- s:0024 e:000023 CFUNC  :each
c:0004 p:0023 s:0020 e:000019 METHOD /bundle/ruby/2.6.0/bundler/gems/puffing-billy-a37a97da9c18/lib/billy/handlers/request_handler.rb:21
c:0003 p:0023 s:0009 e:000008 BLOCK  /bundle/ruby/2.6.0/bundler/gems/puffing-billy-a37a97da9c18/lib/billy/proxy_connection.rb:61
c:0002 p:0003 s:0006 e:000005 BLOCK  /bundle/ruby/2.6.0/gems/em-synchrony-1.0.6/lib/em-synchrony.rb:30 [FINISH]
c:0001 p:---- s:0003 e:000002 (none) [FINISH]

Another one in the same isolated nio4r upgrade :

/bundle/ruby/2.6.0/gems/em-synchrony-1.0.6/lib/em-synchrony.rb:30: [BUG] try to mark T_NONE object
ruby 2.6.6p146 (2020-03-31 revision 67876) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0012 p:---- s:0054 e:000053 CFUNC  :initialize
c:0011 p:---- s:0051 e:000050 CFUNC  :new
c:0010 p:0029 s:0047 E:002460 METHOD /bundle/ruby/2.6.0/gems/em-synchrony-1.0.6/lib/em-synchrony.rb:30
c:0009 p:0011 s:0041 E:0023f0 METHOD /bundle/ruby/2.6.0/bundler/gems/puffing-billy-a37a97da9c18/lib/billy/proxy_connection.rb:60
c:0008 p:0121 s:0037 e:000036 METHOD /bundle/ruby/2.6.0/bundler/gems/puffing-billy-a37a97da9c18/lib/billy/proxy_connection.rb:46 [FINISH]
c:0007 p:---- s:0032 e:000031 CFUNC  :<<
c:0006 p:0008 s:0027 e:000026 METHOD /bundle/ruby/2.6.0/bundler/gems/puffing-billy-a37a97da9c18/lib/billy/proxy_connection.rb:20 [FINISH]
c:0005 p:---- s:0022 e:000021 CFUNC  :run_machine
c:0004 p:0270 s:0018 e:000017 METHOD /bundle/ruby/2.6.0/gems/eventmachine-1.0.9.1/lib/eventmachine.rb:193
c:0003 p:0011 s:0010 E:0009d0 METHOD /bundle/ruby/2.6.0/bundler/gems/puffing-billy-a37a97da9c18/lib/billy/proxy.rb:70
c:0002 p:0005 s:0006 e:000005 BLOCK  /bundle/ruby/2.6.0/bundler/gems/puffing-billy-a37a97da9c18/lib/billy/proxy.rb:19 [FINISH]
c:0001 p:---- s:0003 e:000002 (none) [FINISH]

My knowledge on C libraries is limited and I don't know if the above backtraces is useful at all.

tarcieri commented 3 years ago

That's a NULL pointer exception, although the location it's occurring isn't helpful:

https://github.com/vcr/vcr/blob/v3.0.3/lib/vcr/structs.rb#L123

It may be a sign of memory corruption.

Willianvdv commented 3 years ago

@tarcieri thanks for looking into my comment, much appreciated! I was also thinking about memory corruption. The reason I ended up in this PR is that I can reproduce segfaults (like the one I posted above to this MR). The failure rate with this PR merged on our CI increased with 8% and all issues are related to segfaults.

Our stack is huge so it's hard to pinpoint exactly what's going on. I'll try to create a dummy project to get this in a reproducible state but I can't promise any success.

Willianvdv commented 3 years ago

251 fixed the issue, thanks!