Parallel graceful shutdown for `ThreadPoolTaskExecutor` and `ThreadPoolTaskScheduler`

jgslima commented 3 years ago

Applications that need to perform graceful shutdown of tasks submitted to ThreadPoolTaskExecutor and/or ThreadPoolTaskScheduler, may make use of the setting awaitTerminationMillis (and possibly waitForTasksToCompleteOnShutdown).

However, application may take a very long time to actually finish if these conditions apply:

application uses both ThreadPoolTaskExecutor and ThreadPoolTaskScheduler (and possibly multiple ThreadPoolTaskExecutors).
submitted tasks are not quick.
there are SmartLifecycle beans which implements lenghty stopping.

Real examples are web applications that use such components and make use of the web container "graceful shutdown" feature of Spring Boot. The overall termination sequence of an application is:

SIGTERM is sent.
SmartLifecycle asynchronous stopping triggers the web container graceful shutdown.
SmartLifecycle asynchronous stopping blocks and waits for the web container shutdown.
Context closing proceeds, invoking DisposableBean/@PreDestroy methods, so, say:
1. ThreadPoolTaskExecutor's destroy() is called, blocking and awaiting for the tasks to finish. If the application uses multiple ThreadPoolTaskExecutors, for each one of them an awaiting occurs.
2. ThreadPoolTaskScheduler's destroy() is called, blocking and awaiting for the tasks to finish.

The proposal here is to create the possibility to have some form to make all the pools to finish their tasks in parallel. Ideally, in parallel with the stopping of other SmartLifecycle beans.

In Kubernetes spring web applications that fits in the scenario above, the total time took by a Pod to finish ends up being too high (which is aggravated due to the need to configure a preStop hook to give time to kubeproxy to note the pod deletion). This has real effects: as, rigthly, in a rollout Kubernetes does not wait for old Pods to actually finish in order to create new ones, applications with a large number of pods and with a large termination time, end up having a large number of Pods actually running in the rollout (new Pods and many still being terminated). We have seen this triggering a cluster auto scale to make the cluster able to handle this large number of Pods that occur during the rollout.

jgslima commented 3 years ago

Implementation considerations

Probably the implementation might make ExecutorConfigurationSupport to implement SmartLifecycle.

In my opinion, the implementation of stop does not have to be assynchronous. So, only the stop() method would be implemented (leaving stop(Runnable callback) with its default, synchronous implementation).

In the stop() method, the ExecutorService would be asked to shut down (but without calling here awaitTerminationIfNecessary()). This would make all thread pools to start their shutdown. And, in web applications that use graceful shutdown, this would happen in parallel with the graceful shutdown blocking as well.

After, the destroy() would see that the ExecutorService is already shut down and would call executor.awaitTermination(). In this way, the calls to executor.awaitTermination() would not happen in parallel, but it does not really matter, since all thread pools will be finishing their tasks in parallel. The total time took to all the calls to destroy() would be, roughly, the greatest (worse) time took by some of the pools.

jgslima commented 3 years ago

Hello people. Any thoughts on this? Thanks.

jgslima commented 2 years ago

@rstoyanchev , I do not want to contribute to this vast amount of issues in the spring project. As it seems, the community creates an amount of issues greater than the capacity of yours to analyze them.

Feel free to just close the issue.

jgslima commented 2 years ago

Spring team, I am considering closing this issue, unless you want to keep it open. Did you have chance to analyze this?

Here, the lenghty application shutdown remains.

Thanks.

jgslima commented 1 year ago

If I work on the code and submit a PR for this, it would be considered?

jhoeller commented 1 year ago

In the context of Project CRaC support, we are revisiting our own lifecycle adapters, and we ran into the executor/scheduler implementations as well. It is somewhat more complicated with stop/restart having to be possible, so a shutdown on stop() would have to be followed by a recreation of the thread pool on start(), or we'd have to use a different trigger to stop the currently running tasks while keeping the thread pool alive.

We'll try to sort this out for our 6.1.

jhoeller commented 1 year ago

The challenge with the Lifecycle contract is that it serves several purposes, and even one more now with Checkpoint Restore. ThreadPoolTaskExecutor and in particular ThreadPoolTaskScheduler are not really ideal candidates for implementing Lifecycle since the underlying ThreadPoolExecutor has no stop or restart functionality, just shutdown. So whenever a Lifecycle.stop() call comes in but later followed up by Lifecycle.start(), we could only really reinitialize the ThreadPoolExecutor completely, losing all of it previously scheduled tasks... which we'd have to re-schedule somehow.

Up until this point, we treated ThreadPoolTaskExecutor and ThreadPoolTaskScheduler as general delegates which do not participate in lifecycle management on their own. Instead, whoever submits tasks to the executor/scheduler is responsible for pausing/cancelling/rescheduling them whenever necessary. For graceful shutdown purposes, such callers could implement Lifecycle themselves and stop their tasks early, so that the target executor does not have any actual tasks left whenever it eventually gets disposed. This is the approach that we applied ourselves in DefaultMessageListenerContainer and co.

For Project CRaC, it is not actually necessary to stop/restart thread pools since the active threads are a part of the checkpoint image anyway. So we currently only stop DefaultMessageListenerContainer and co, winding down its async workers but not the target thread pool that it is using. That said, it would not hurt to stop and restart thread pools themselves as well which is what we're currently investigating. If such a thread pool restart turns out to be impossible, a regular Lifecycle implementation would not be appropriate since it would receive stop calls in restart scenarios as well, not just for graceful shutdown.

@jgslima what kind of tasks are typically left to await termination for in your scenario, and how did those tasks get submitted? Is this about late-submitted one-off tasks during the shutdown process, or rather about periodic tasks kicking in once more?

jhoeller commented 1 year ago

In terms of an early shutdown signal for executors and schedulers, it does not have to come from Lifecycle, it could also come from ContextClosedEvent which arrives even before the lifecycle stop calls (and does not have any restart semantics attached to it). This would effectively run in parallel with SmartLifecycle stopping, awaiting actual termination in the destruction step then.

jhoeller commented 1 year ago

Experimenting with this a bit, we seem to be able to implement the general Lifecycle stop/start semantics with a pause/resume approach in (Scheduled)ThreadPoolExecutor subclasses (along the lines of the PausableThreadPoolExecutor suggested in the ThreadPoolExecutor javadoc) which would address our Checkpoint Restore requirements as well.

This combines nicely with a ContextClosedEvent-driven early shutdown signal, addressing graceful shutdown scenarios then (which are otherwise not covered by such a pause/resume approach for stop/start). For ThreadPoolTaskExecutor and ThreadPoolTaskScheduler, we should be able to provide an out-of-the-box solution for all those scenarios in 6.1.

jgslima commented 1 year ago

Hello.

It is somewhat more complicated with stop/restart having to be possible

Good point. I had not thought about it.

it could also come from ContextClosedEvent which arrives even before the lifecycle stop calls.

Ok, that seems fine. It also solves another point, that is, if we really would implement SmartLifecycle attached to the same Phase of the web servers, I do not know how we would be able to guarantee the same phase without including a reference to the phase of WebServerGracefulShutdownLifecycle (currently SmartLifecycle.DEFAULT_PHASE - 1024), which is a spring-boot class.

Instead, whoever submits tasks to the executor/scheduler is responsible for pausing/cancelling/rescheduling them whenever necessary.

I understand.

@jgslima what kind of tasks are typically left to await termination for in your scenario, and how did those tasks get submitted? Is this about late-submitted one-off tasks during the shutdown process, or rather about periodic tasks kicking in once more?

It is not about late-submitted tasks.

Some examples of thread pools used in some applications I work with:

executions of callbacks of Futures returned by KafkaTemplate.send() (such callbacks cannot be done in the main Kafka producer thread to avoid blocking it).
executions of SQS listeners (with version 3.x of spring-cloud-aws, by default the framework deals with stopping the executors, but due to some reasons here we make the application to provide the taskExecutors).
executions of @EventListener (in cases where the event publishing was configured to be always async).
executions of @Scheduled
executions of @Async.

My little "obsession" to have a graceful shutdown with the most possible efficiency, is to avoid loose database locks when some task is aborted abruptly. These create some headaches (MySQL by default only kills inactive sessions after 8 hours).

It is true that really long running tasks (like a full batch process) should not be run with any of the means above. For instance, in Kubernetes, a long running task should be submitted as a Job, which is not asked to be stopped before its conclusion.

However in some cases above, the task may involve doing something potentially slow, like making a remote call. When this is the case, ideally we would like to configure awaitTerminationMillis with actually some seconds. But when you have so many executors being stopped sequentially, and only after the web server, it does not really help to configure many seconds to each (say, 6 or 8), because the sum would be greater than the default gracePeriod of Kubernetes.

I would love to help with with this change, if you want, please tell me.

Thanks

jgslima commented 1 year ago

@jhoeller , since you will change ThreadPoolTaskExecutor (or some superclass), could you please make another change? To make ThreadPoolTaskExecutor to expose the size of the internal ThreadPoolExecutor's queue.

Something like int getTaskQueueSize() that returns threadPoolExecutor.getQueue().size().

We miss this in order to be able to provide a Gauge metric to be able to monitor the queue size in order to identify undersized pools.

I think it is better to expose just the size and not the Queue as whole, to avoid direct interactions with it.

Thanks

jhoeller commented 1 year ago

@jgslima the queue size has recently been exposed as getQueueSize() already: #28583

For #24497, I'm introducing an initiateShutdown() method with always-non-blocking semantics which is what we trigger on ContextClosedEvent now but which can also be publicly called.

jhoeller commented 1 year ago

@jgslima it would be great if you could give the upcoming 6.1.0-M2 release a try! This tries to address a wide range of requirements now and hopefully improves your graceful shutdown scenarios out of the box.

jgslima commented 1 year ago

You bet, I will give a try and get back to you.

jgslima commented 1 year ago

@jhoeller, sorry for taking so long.

I tested it and at first I did not really get it, then I saw the new code and understood.

I have 2 comments:

the executor stops accepting new tasks as soon as the ContextClosedEvent is published, which seems good. We may have a side effect, that will be RejectedExecutionExceptions being thrown to the components that keeps submitting tasks to the executors, if these components themselves do not also react to the ContextClosedEvent. Ideally these task submitters should also react to the event. Right now I was not able to recap and experiment with some spring components like the containers (JMS, Kafka, Rabbit, SQS), or the mechanisms used by @Async and @Scheduled. Not sure if the change will cause new ERROR logs that may confuse people. In fact, I might have though about this when I opened the issue in the first place, but just realized now. What do you think? I may dedicate more time here to analyze and/or test this.
in the new initiateShutdown() method, it is clear that you wanted the scope of the method to be just stop accepting new tasks. But, if waitForTasksToCompleteOnShutdown is false (and it should be false in the majority of scenarios when we want a graceful shutdown, since the task queue may have a large amount of submitted tasks), at this point I would prefer to call executor.shutdownNow(), to halt the processing of waiting tasks right away. After all, we already know at this point that the context is closing... so for a graceful shutdown, I would not want any pending task to start execution after this point. But I can see that ExecutorConfigurationSupport attempts to actually cancel the pending tasks returned by shutdownNow(). At first, I am not sure how to properly handle this in the event listener.

Please tell me if I can help somehow.

jgslima commented 1 year ago

My mistake. The containers, like you said, use executors which are not context beans. But this is not the case for @Scheduled and @Async. I will experiment further.

spring-projects / spring-framework

Parallel graceful shutdown for `ThreadPoolTaskExecutor` and `ThreadPoolTaskScheduler` #27090

Implementation considerations