Fork detection and automatic reconnect in child process

Envek commented 1 year ago

The problem: sometimes people make “global” NATS clients in their app initializers and they are using either Unicorn or clustered Puma as application server for running their applications. In that case that client silently stops working. This happens because after fork the child process has only main thread, other threads aren't copied (see fork docs).

Solution: use ability to hook into forking via Process.fork (since Ruby 3.1) to re-initialize client (re-creating all auxiliary threads) and establish new connection to the NATS server/cluster (thus avoiding problems with multiple processes using the same connection via the same descriptor and getting corrupted data as a result)

Implementation notes:

all existing subscriptions are still continue to work only in the parent process. Child process doesn't have any subscriptions (and, probably, shouldn't)

Note: currently subscriptions aren't tracked after creation anyway (it is at the mercy of the calling application), there is only thread being created to handle incoming messages.

In this pull request:

[x] Fork detection and client re-initialization and re-connection
[x] Specs for subscriptions
[x] Specs for requests and responses
[x] Specs for jetstreams

wallyqs commented 1 year ago

Seems that there is some extra shared state that we need to reset after the fork happens:

Client - Fork detection
  should be able to publish messages from child process after forking (FAILED - 1)
  should be able to make requests messages from child process after forking (FAILED - 2)
/home/travis/build/nats-io/nats-pure.rb/lib/nats/io/client.rb:1056:in `send_command': undefined method `<<' for nil:NilClass (NoMethodError)
      @pending_queue << command

Envek commented 1 year ago

@wallyqs, finally I managed to win all flaky specs. Also found that Ractors doesn't work after fork (forked Ruby process crashes when you try to interact with ractors), so had to disable fork handling for non-main Ractors.

nats-io / nats-pure.rb

Fork detection and automatic reconnect in child process #114