Open HoneyryderChuck opened 7 years ago
I'm narrowing it down in the C extension to which call exactly triggers the pointer being freed was not allocated
, And it's in selector.c:
ev_loop(selector->ev_loop, EVLOOP_ONESHOT);
Unfortunately that doesn't really narrow it down very much, as that's where the bulk of libev's functionality is.
There is definitely work needed on signal handling (see #134)
@tarcieri had to pause the investigation, but yes, long story short, it is signal handling.
This one will be maybe hard to reproduce in the current test suite, as it is using rspec, and I don't know if there's an "hell" mode like in minitest. However, after patching the trap call, I've come to this conclusion:
trap
2 times for every test for INFO
.trapping: ["INFO"]
trapping: ["INFO", "SYSTEM_DEFAULT"]
passed as arguments. If traps are being set all over the place in a GIL-parallel way, this might have side-effects for nio
. Maybe one could simulate this by trapping INFO
multiple times???
Just want to add, that in my experience, minitest does some weird shit with processes. It may not be causing the issue here, but I wouldn't be surprised if it was. I'd suggest making a test case working entirely independent of minitest. If you can supply that, I'll take a look.
I've now resorted to not handling traps in tests (for now), and am just closing descriptors/selector for every test. I was seeing quite a few crashes until recently however, which led me to believe that traps might have been just a red herring.
I was doing something similar to this:
def close
@selector.close
@wpipe.close
@server.close
end
and after a few tests and reactor open/close scenarios, one of them would eventually crash the VM. this even happened in JRuby.
I managed to fix it though, by closing the selector last:
def close
- @selector.close
@wpipe.close
@server.close
+ @selector.close
end
which was quite interesting in itself.
I'll see about getting a reproducible script (I don't know if any other variables in my tests cause this, I just know what the fix was).
@HoneyryderChuck if you have time to revisit this and confirm whether it's still an issue that would be super helpful.
I'm currently experiencing this issue, which can't be consistently reproduced, but consistently happens in the same code path.
I'm using the pattern of using a pipe to control the lifecycle of the process/loop. This is the simplified version of the trigger:
The reader in the main thread will deal with the TERM signal, and write to another pipe, which reader is registered in a NIO loop. The registered handler should evaluate the message, and stop the loop. This is the intended behaviour, and it does happen most of the time.
Two types of errors happen from time to time, however:
IOError: stream closed
on write (although the reader successfully received and handled the message; simply rescuing the exception "patches" this behaviour, doesn't fix it however)This happens usually under heavy load like when I run tests which start/stop many loop instances. When running them sequentially, this happens rarely. Now I've added
minitest/hell
, and I'm seeing it way more often. This leads me to believe that there is some race condition somewhere, and would greatly appreciate some input on how to debug this.This is the relevant information I can gather:
select
syscallHere's the relevant coredump (I'll ignore the non-relevant ruby-platform threads until someone asks otherwise):