nats-io / nats-pure.rb

Ruby client for NATS, the cloud native messaging system.
https://nats.io
Apache License 2.0
131 stars 30 forks source link

Fork detection and automatic reconnect in child process #114

Closed Envek closed 1 year ago

Envek commented 1 year ago

The problem: sometimes people make “global” NATS clients in their app initializers and they are using either Unicorn or clustered Puma as application server for running their applications. In that case that client silently stops working. This happens because after fork the child process has only main thread, other threads aren't copied (see fork docs).

Solution: use ability to hook into forking via Process.fork (since Ruby 3.1) to re-initialize client (re-creating all auxiliary threads) and establish new connection to the NATS server/cluster (thus avoiding problems with multiple processes using the same connection via the same descriptor and getting corrupted data as a result)

Implementation notes:

In this pull request:

wallyqs commented 1 year ago

Seems that there is some extra shared state that we need to reset after the fork happens:

Client - Fork detection
  should be able to publish messages from child process after forking (FAILED - 1)
  should be able to make requests messages from child process after forking (FAILED - 2)
/home/travis/build/nats-io/nats-pure.rb/lib/nats/io/client.rb:1056:in `send_command': undefined method `<<' for nil:NilClass (NoMethodError)
      @pending_queue << command
Envek commented 1 year ago

@wallyqs, finally I managed to win all flaky specs. Also found that Ractors doesn't work after fork (forked Ruby process crashes when you try to interact with ractors), so had to disable fork handling for non-main Ractors.