prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.06k stars 5.38k forks source link

[Native] Support Graceful Shutdown in Prestissimo #23299

Closed ethanyzhang closed 2 days ago

ethanyzhang commented 3 months ago

In Prestissimo, we do not yet support the PUT method in /v1/info/state for transitioning the server to the SHUTING_DOWN state for graceful shutdown

https://prestodb.io/docs/current/release/release-0.128.html#graceful-shutdown

I do see the API exists: https://github.com/prestodb/presto/blob/72e771bb5cf1aa62ea2d708e3501ed42b13ab59d/presto-native-execution/presto_cpp/main/PrestoServer.cpp#L882-L890 but not hooked up with the REST API endpoint. /v1/info/state now only supports GET: https://github.com/prestodb/presto/blob/72e771bb5cf1aa62ea2d708e3501ed42b13ab59d/presto-native-execution/presto_cpp/main/PrestoServer.cpp#L349-L357

kewang1024 commented 3 months ago

Yes, unlike http end point in Java, the signal is received by here: https://github.com/prestodb/presto/blob/39d419e8a13b4ff6ec372e91e8cb82bbc36be926/presto-native-execution/presto_cpp/main/SignalHandler.cpp#L29

ethanyzhang commented 3 months ago

Shakyan Kushwaha from my team is working on this

amitkdutta commented 2 months ago

@ethanyzhang This is intentially not implemented in native worker, because native worker can handle Unix Process signals (e.g. SIGTERM) and trigger shutdown automtaically. This is a common pattern for any UNIX process to handle signal instead of explicit shutdown call (e.g. hitting CTRL + C to any application sends SIGTERM to the application, which is then handled by the application if signal handler is written. In java, such signal handling is likely not possible - hence the API was added to ask worker to shutdown.

ethanyzhang commented 2 months ago

@amitkdutta I believe the motivation here is for bringing the cluster auto-scaling we support in our Java offering to Prestissimo. We need to implement the same Java semantics like documented in https://prestodb.io/docs/current/release/release-0.128.html#graceful-shutdown when we need to scale back the cluster so we can call the API to notify a worker to stop like we can do in Java.

ethanyzhang commented 2 months ago

@aditi-pandit @majetideepak @tdcmeehan can you help follow up on this?

majetideepak commented 2 months ago

@amitkdutta It is not straightforward to send a unix process signal if the control plane/orchestration is on a remote node. Ahana's control plane uses REST API to manage the scaling. We want to use the same implementation to manage both Java and C++ clusters. Do you have any concerns or are there any downsides if we add this endpoint?

amitkdutta commented 2 months ago

@majetideepak Thanks for explaining it. I don't see any downside on adding this endpoint.

aditi-pandit commented 2 months ago

@amitkdutta : Just out of curiosity, does Meta not have a need for shutting down the worker externally ? It would seem strange to me if you are sending a unix process signal.