Closed ianks closed 1 year ago
Since ruby_current_vm_ptr
is not a public symbol, I had to declare it with as an extern
. I think the symbol name has changed in various ruby versions, so it would be best to use stable API to do this check.
Is there any other public API that I can use instead to ensure the VM is running? Otherwise, I’ll plunge forward with detecting the various names of this symbol across Ruby versions.
Perhaps adding an at_exit { StackProf.stop }
would be a possible solution? I've ran into this in #157 as well.
On Ruby 2.4 and before, it should be called ruby_current_vm
rather than ruby_current_vm_ptr
. But at any rate, we shouldn't rely on ruby_current_vm_ptr
being available for extensions because it isn't marked as public (it isn't wrapped within a RUBY_SYMBOL_EXPORT_BEGIN
/RUBY_SYMBOL_EXPORT_END
region). AFAIK there isn't a public API to determine if the Ruby VM is available.
Perhaps adding an
at_exit { StackProf.stop }
would be a possible solution? I've ran into this in #157 as well.
That may help, and is probably a good change in its own right. I would still like to add this guard though since it improves the async signal safety of the handler so it's tricky to reason about.
Perhaps adding an
at_exit { StackProf.stop }
would be a possible solution? I've ran into this in #157 as well.
I think stackprof should probably do this by default. Either that or its callees should check for existence of the VM.
We could check for the VM in stackprof, but that seems kind of fraught as the global is private which means it can be renamed at any time and the compiler is free to do whatever it wants with the symbol name.
We could check for the VM in stackprof, but that seems kind of fraught as the global is private which means it can be renamed at any time and the compiler is free to do whatever it wants with the symbol name.
Yeah I have similar feelings, and very open to other ways of solving this. Maybe a null check in rb_during_gc
would be better?
Maybe a null check in rb_during_gc would be better?
No I don't think that will solve the underlying problem since the code in stackprof that is crashing will then instead crash a few lines down in places like rb_postponed_job_register_one
if the VM doesn't exist.
Maybe a null check in rb_during_gc would be better?
No I don't think that will solve the underlying problem since the code in stackprof that is crashing will then instead crash a few lines down in places like
rb_postponed_job_register_one
if the VM doesn't exist.
I think you are right.
A good stop gap may be to register a new at_exit
proc with our own variable (bool ruby_vm_exited
) that we can test against, rather than using the VM*
@tenderlove @peterzhu2118 Thoughts on my updates?
@ianks lgtm! Thanks for the patch!
FWIW, I think the failing test may be flaky? I noticed it failing on master
as well locally (on 3.2).
FWIW, I think the failing test may be flaky? I noticed it failing on
master
as well locally (on 3.2).
Ya, it's flaky. I'll merge this and ship it. I think we're trying to measure some GC stuff in the tests and it's not predictable.
As of a few weeks ago, there's been an increase of sigabrt in prod for us. After hunting the issue down a bit with @peterzhu2118, we honed in on a case where stackprof is signaled and
ruby_current_vm_ptr
is null. This seems to be happening after SIGQUIT'ing a unicorn worker duringrb_during_gc
.Since signal safety is hard to reason about, let's add a sanity check and not assume the Ruby VM is active just because the handler is called.