After fork hook should (possibly) run in parent process too

toregeschliman commented 1 year ago

Today in the after_fork hook, we are checking whether we're in the child process and if so, we drain all of the pools and (if it's implemented) call .close on the connections. However, the parent process doesn't drain the pools, so if you are using something like puma's fork_worker mode (or implementing your own custom forking, or otherwise fork from a process that is expected to still do work afterwards) you end up with a parent process that has connections in the pool that may have been broken in the child process, depending on how the connection class in question implements .close (e.g. Net::Http::Persistent, on forking, will close every connection in the child process, leaving the parent open to a bunch of errors whenever it tries to use one again). I think there are three alternatives that are less likely to surface strange bugs than the current implementation:

1) Don't call .close on the connections, just drain the pools in the child process: I think the only risk here is if someone has kept a persistent reference to a connection outside the pool, which is already against the design of this library and can hardly be blamed on it if something goes wrong. Additionally, many connection libraries implement their own fork-safety already anyway so this call will become increasingly redundant. 2) Drain the pools in the parent process as well: This leaves the parent process in a safe state to do work but adds the overhead of re-creating all the connections it held before. The only change from current state is fewer surprise errors. I think this route should also include calling .close in the parent rather than the child, since they were the parent's connections to begin with, but that's just semantics as long as they both drain their pools. 3) Change the default of auto_reload_after_fork to false: This gem is used by a bunch of other gems that don't necessarily implement this new option to pass through, and it seems (to me at least) less disruptive for people to add their own fork hook than to monkey-patch the one implemented here as a workaround. This is likely the least-safe option since end users would need to be aware of the problem in the first place and could determine whether their pooled connections were fork-safe or not, but it has the advantage of not changing the expected behavior at all for anyone who has actively engaged with it already (and since it's relatively new, passive engagement of seeing the default and saying "cool, do that" is likely quite minimal).

Happy to put up a PR for any of the three options (or others people can think of), but looking for input on whether anyone's had similar issues and on which solution is preferable.

toregeschliman commented 1 year ago

Went ahead and put up a PR for option 1, since it seems like the most correct choice.

mperham commented 1 year ago

cc @byroot

byroot commented 1 year ago

Yeah, depending on the type of connections you may need to close them before fork and not after.

Some protocol send a packet on close to notify the other side, these need to be closed before fork in the parent.

In some other case, if the socket wasn't flushed, closing it in the child may send a RST packet, which will impact the parent connection.

toregeschliman commented 1 year ago

Client libraries don't all take the same approach here; many like Dalli have their own fork protection (which shares this problem as it calls close when it detects a fork). Others like ActiveRecord's PG adapter (strictly an example of a client with good post-fork behavior, as it doesn't use this ConnectionPool) do something like @connection.socket_io.reopen(IO::NULL) that leaves the connection open in the parent while detaching the child. IMO it should be up to the client to respond to forks as appropriate; if we want to guard the pool against strange issues in individual implementations (or just missing fork safety), we can drain the pool in the parent process before forking as well.

byroot commented 1 year ago

IMO it should be up to the client to respond to forks as appropriate

Ideally yes. The reason I contributed this feature to connection_pool was to have a perhaps not perfect, but baseline handling of this.

I understand that closing in the child may be imperfect (e.g. impacting parent), but in the vast majority of cases it at least avoid the catastrophic bug of continuing to share the connection. So to me it's a net positive.

we can drain the pool in the parent process before forking as well.

So @mperham recently gave me commit bit on this repo, I'll have to check with him what he'd accept or not.

But in the meantime my opinion is that this would make sense as an opt-in option:

The most common case of forking is pre-fork, and generally the parent doesn't use connections, so impacting them is not a problem.
Only some protocols are impacted by this.
In case where the parent do use the connection, closing every time mean it needs to reconnect which isn't ideal either.

So if Mike is OK with it, I'd make this a configuration on the Pool instance.

mperham commented 1 year ago

I've never seen a scenario where the parent and child both use the same pool. preforking usually has a parent process boot a canonical child and then fork that child over and over but the parent process just manages the children, it's not doing much other work. Maybe this is a case where you need to fork connection_pool or rethink how pools are managed in your app.

Keep in mind, I hate hate hate config knobs as they multiply code complexity. I'm against adding another config knob because one person reported an issue. I'd rather solve 90% of the problem with 10% of the code and leave it there.

byroot commented 1 year ago

I've never seen a scenario where the parent and child both use the same pool.

I've seen some cases where people fork from a worker to do some background work.

As mentioned before, I think after_fork is a no-brainer as best case scenario it works fine, worst case scenario it turns of potential for data corruption etc, into a straighter error.

In rare case I agree you need before fork, but it's the exception, so I we don't want to add a config, then I think it's fine to just document that, and advise to close the pool manually.

mperham / connection_pool

After fork hook should (possibly) run in parent process too #180