prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.92k stars 5.33k forks source link

Coordinator driven graceful shutdown of worker #12090

Open dzhi-lyft opened 5 years ago

dzhi-lyft commented 5 years ago

To graceful shutdown a presto worker before let EC2 terminate the instance, the current way is to PUT the SHUTTING_DOWN like below: curl -v -XPUT --data '"SHUTTING_DOWN"' -H "Content-type: application/json" http://250.0.46.167:8081/v1/info/state The worker process immediately response to the request and should there be no pending query, wait the double grace period of 2 minutes each. coordinator soon reflect the node state as "shutting_down". 4 minutes later, the worker process exit as expected.

So far so good, however as the case in any managed daemon. The worker process is immediately restarted, and soon the node is back to "active" in coordinator. This current behavior is not very desirable. Firstly due to the issue above. Secondly the auto-scale agent need talk to the workers.

Presto experts, how do I work around the current issue?

I would propose the following coordinator driven approach: shutdown request will be sent to the coordinator (instead of worker). coordinator tells the worker to shutdown and later logically removes it from active node list. If the worker restarts and register back, coordinator will ignore it for the next hour. What's your opinion?

nezihyigitbasi commented 5 years ago

So far so good, however as the case in any managed daemon. The worker process is immediately restarted, and soon the node is back to "active" in coordinator.

Are you saying that the supervisor process (e.g., daemontools) is restarting the worker you just shutdown? If yes, can you solve this problem at the supervisor process level instead of modifying the shutdown protocol? Before you shutdown a worker, can you tell the supervisor not to restart that particular process first, and then send the PUT request to that worker to bring it down?

dzhi-lyft commented 5 years ago

It would further require the auto-scale-agent not only talks to the worker but has part of logic RUNs on the worker instance so to modify the service script or configuration to NOT restart the presto service daemon. It is part of the reason I propose the coordinator-driven approach so that the external auto-scale-agent only need talks to the coordinator.

As a useful context, we ran Presto cluster inside AWS auto-scaling group. ASG support lifecycle-hook where instance TERMINATING notifications will be send to SQS, auto-scale-agent polls SQS and execute the graceful shutdown before notify ASG to continue the termination.

ggreg commented 5 years ago

How does the coordinator discover the workers in that cluster?

ggreg commented 5 years ago

I'm wondering if there is a way to leverage an external discovery service to define the state of the Presto worker and use it both in the local supervisor process and the auto-scale-agent.

dzhi-lyft commented 5 years ago

Each worker uses discovery.uri config property to report it self to coordinator. DiscoveryNodeManager in the coordinator discover all new worker. With the coordinator-driven graceful shutdown proposal, DiscoveryNodeManager will treat restarted worker (after SHUTTING_DOWN) as INACTIVE.

taklwu commented 5 years ago

any update on PR ?

dzhi-lyft commented 4 years ago

Here is the patch (add .txt suffix to be able to drop here) I created before based on version 0.210.

0001-Coordinator-driven-graceful-shutdown-of-worker-nodes.patch.txt