matsadler / magnus

Ruby bindings for Rust. Write Ruby extension gems in Rust, or call Ruby from Rust.
https://docs.rs/magnus/latest/magnus/
MIT License
682 stars 35 forks source link

Sidekiq not handling signals? #104

Closed choubacha closed 6 months ago

choubacha commented 8 months ago

We're trying to debug an issue where our sidekiq workers (running code which is built with magnus) are not shutting down properly. Based on the stack, it looks like they never reacted to a TERM signal.

I've noticed this comment/function which leads me to believe that we may be blocking the ability for ruby to handle interrupts during the call. https://github.com/matsadler/magnus/blob/a1e98dd6aa85ab96d5d93a980feb64e856a56c21/src/thread.rs#L377-L396

However, when we tried to call the above function, it was private! I'm not sure what we should do in this case or if this is a known issue?

matsadler commented 8 months ago

That function isn't in a released version of Magnus. It should be available with the 0.7 release, which is pretty much ready to go, I just want to do a little more testing and need to find the time for that.

I've put together an example of some problems you could have, and workarounds, in a repo here: https://github.com/matsadler/ruby_native_gem_interrupt_example

choubacha commented 8 months ago

@matsadler Thanks! We'll take a look at the example and see if we can make it work for us!

Vagab commented 7 months ago

@choubacha were you able to make it work? Could you share some code for that? I don't think I have the same problem but I'm planning to use magnus code in sidekiq workers

choubacha commented 7 months ago

@Vagab We have not gotten it to work yet :(

Vagab commented 7 months ago

@choubacha could you maybe provide a minimal reproducible example? I might try some things from @matsadler's answer

choubacha commented 7 months ago

@Vagab We can try, I don't currently have a minimal reproducible example at the moment. We'll update when we do.

choubacha commented 6 months ago

So far I haven't been able to reproduce outside our service running. I'm actually more convinced that this is some deadlock in our code but haven't pin pointed it. I'm going to close this issue for now until I can isolate.