treasure-data / serverengine

A framework to implement robust multiprocess servers like Unicorn
Apache License 2.0
759 stars 86 forks source link

socket_manager: add feature to share sockets with another server #150

Closed daipom closed 2 weeks ago

daipom commented 3 weeks ago

Another process can take over UDP/TCP sockets without downtime.

server = ServerEngine::SocketManager::Server.share_sockets_with_another_server(path)

This starts a new server that shares all UDP/TCP sockets with the existing server. The old process should stop without removing the file for the socket after the new process starts.

This allows us to replace both the server and the workers with new processes without socket downtime. (The existing live restart feature does not support network servers. We can restart workers without socket downtime, but there is no such way for the network server.)

ref: https://github.com/fluent/fluentd/issues/4622

Limitation

TODO

daipom commented 3 weeks ago

It would be necessary to consider exclusive locks.

daipom commented 2 weeks ago

https://github.com/treasure-data/serverengine/compare/772d7dfa672038fb0745eee7eb608799ef81b9a1..13b3d398a68d5924e9d5365a302b16c2804a1124

Somehow, the fork process couldn't receive SIGTERM, so I gave up using fork for the tests.

kenhys commented 2 weeks ago

It may be better to note explicitly the scope of this PR (out of scope live-restart for supervisor, we focus on server <=> worker) https://github.com/treasure-data/serverengine?tab=readme-ov-file#live-restart

daipom commented 2 weeks ago

Thanks for your review! This PR allows us to restart network servers without socket downtime. I fixed the description of this PR and the commit message.

daipom commented 2 weeks ago

I'll rebase this.

daipom commented 2 weeks ago

Thanks for your review!