helidon-io / helidon

Java libraries for writing microservices
https://helidon.io
Apache License 2.0
3.53k stars 565 forks source link

Nima WebServer.stop() is slow / not timely when there are no active requests #5717

Closed rbygrave closed 1 year ago

rbygrave commented 1 year ago

Environment Details


Problem Description

1) WebServer#stop() when there have been NO requests takes ~500ms (value of ServerListener.EXECUTOR_SHUTDOWN_MILLIS) 2) WebServer#stop() on an Idle server that has no active requests takes ~500ms

Background wrt Reative Netty server

https://github.com/helidon-io/helidon/issues/2087

Steps to reproduce

PR - https://github.com/helidon-io/helidon/pull/5748 with failing tests for 1) and 2) Edit: This PR now also includes 2 ways of addressing the issue via modified jdk ThreadPerTaskExecutor and a application code approach (which I don't think is very nice).

Edit: Old PR was https://github.com/helidon-io/helidon/pull/5718

Motivation

Fast graceful shutdown as I see it is critical to the performance of Kubernetes rollout. That is, K8s rollout is generally only as fast as the startup time + shutdown time so if server shutdown is slow then rollout is slow. This is more impactful when a K8s service has a lot of pods/replicas and fast graceful shutdown is very important.

rbygrave commented 1 year ago

Note that the WebServerStopOnlyTest will pass when executor.shutdown(); is added to ServerListener.shutdownExecutor() prior to the call to executor.awaitTermination(...). I expect this to also fix the idle case when the client requests are made as http 1.0 or no keepalive.

rbygrave commented 1 year ago

For the WebServerStopIdleTest, as I see it the readerExecutor has a ConnectionHandler and for this test its connection is a Http1Connection.

At the time of webServer.stop() the thread running the Http1Connection is state WAITING while it's readPrologue() - it's waiting for the next request to come. At: https://github.com/helidon-io/helidon/blob/main/nima/webserver/webserver/src/main/java/io/helidon/nima/webserver/http1/Http1Connection.java#L115

A sort-of-hack-that-fixes-this is to add a stopIfIdle() method to Http1Connection like below and invoke it before the ServerListener does shutdownExecutor(readerExecutor); at: https://github.com/helidon-io/helidon/blob/main/nima/webserver/webserver/src/main/java/io/helidon/nima/webserver/ServerListener.java#L134

    Thread thread;
    volatile boolean currentlyReadingPrologue;

    public void stopIfIdle() {
        if (currentlyReadingPrologue && thread.getState() == Thread.State.WAITING) {
            System.out.println("Http1Connection.stop() - interrupt thread [" + thread + "] while its WAITING readingPrologue " + Instant.now());
            thread.interrupt();
        }
    }

With some extra system out the test passes with output like:

INFO: Helidon Níma 4.0.0-SNAPSHOT
Dec 19, 2022 5:20:46 PM io.helidon.nima.webserver.LoomServer startIt
INFO: Started all channels in 39 milliseconds. 685 milliseconds since JVM startup. Java 19.0.1+10-21
Http1Connection handle thread VirtualThread[#36,[0x6059d64c 0x05f5aadb] Nima socket]/runnable@ForkJoinPool-1-worker-1
Http1Connection readPrologue 2022-12-19T04:20:47.125184379Z
Http1Connection prologue .. 2022-12-19T04:20:47.128097314Z
Http1Connection readHeaders 2022-12-19T04:20:47.128298096Z
Http1Connection route complete 2022-12-19T04:20:47.149144534Z
Http1Connection readPrologue 2022-12-19T04:20:47.149248333Z
webServer.stop() requested
Http1Connection.stop() - interrupt thread [VirtualThread[#36,[0x6059d64c 0x05f5aadb] Nima socket]/waiting] while its WAITING readingPrologue 2022-12-19T04:20:47.166895566Z
Http1Connection.handle() FINISHED 2022-12-19T04:20:47.167596258Z
Dec 19, 2022 5:20:47 PM io.helidon.nima.webserver.ServerListener listen
INFO: [0x6059d64c] @default socket closed.
Dec 19, 2022 5:20:47 PM io.helidon.nima.webserver.LoomServer stopIt
webServer.stop() completed
INFO: Níma server stopped all channels.

and with this I can bump the EXECUTOR_SHUTDOWN_MILLIS = 500L; up to 10 seconds 10_000L.

rbygrave commented 1 year ago

I have pushed a solution for Http1Connection.stopIfIdle() so the tests pass. Not convinced it's an elegant solution though as we now have to "double account" for Http1Connection in both the executorService (ThreadPerTaskExecutor) and now additionally in a application ConcurrentHashMap<ConnectionHandler, Boolean>.

It would be more elegant if the JDK ThreadPerTaskExecutor exposed some API to get the active threads or Runnables that are in thread state WAITING. Alternatively instead of Runnable have a InterruptableRunnable that extends Runnable so we do something like:

ConnectionHandler implements InterruptableRunnable {

  ...
  private volatile boolean currentlyReadingPrologue;

  @Override
  public boolean isInterruptableRunnable() {
    return currentlyReadingPrologue; // true before http1prologue.readPrologue() and false after that 
  }

  @Override
  public void run() {
    ...
    while (true) {
      // prologue (first line of request)
      currentlyReadingPrologue = true;
      HttpPrologue prologue = http1prologue.readPrologue();
      currentlyReadingPrologue = false;
      ...
  }
}

... and have the internals of ThreadPerTaskExecutor.shutdown() iterate the alive threads in WAITING state that are instances of InterruptableRunnable and if they are isInterruptableRunnable() then interrupt them and then carry on with the rest of the shutdown logic.

jensli commented 1 year ago

Hello @rbygrave,

I saw your message about this on loom-dev. I'll post my answer here too.

I don't think this has anything to do with Loom and virtual threads.

The way to interrupt sockets has always been to close them. Then blocking methods will throw an exception.

See this, for example: https://stackoverflow.com/questions/4425350/how-to-terminate-a-thread-blocking-on-socket-io-operation-instantly

Nima seems to do that for the server socket, but it FIRST waits for the ExecutorService to shutdown:

https://github.com/helidon-io/helidon/blob/a45b04cbea62c149ab3d8efe9c63180bed573b00/nima/webserver/webserver/src/main/java/io/helidon/nima/webserver/ServerListener.java#L133-L137

I think Nima should close the socket FIRST, then shutdown the ExecutorService.

(I have not checked how Nima handles the client sockets. They should probably be treated in the same way.)

rbygrave commented 1 year ago

way to interrupt sockets has always been to close them

You'll see my answer in loom-dev but yes we want graceful shutdown allowing for in-flight requests. We can't just close the connections or interrupt the Http1Connection threads unless we know they are not the middle of processing in-flight requests.

I think Nima should close the socket FIRST, then shutdown the ExecutorService.

I agree we can and should do the serverSocket.close() before shutting down the executor services but note that that does not fix this issue. Doing the serverSocket.close() will stop any new connections being made and I agree that I think that should be done first.

The serverSocket.close() doesn't impact the existing connections though. We still need to somehow iterate through the Http1Connection instances that are not in-flight (that are IO waiting on http1prologue.readPrologue()) and interrupt those threads and we need to do this before the call to executor.awaitTermination(...).

jensli commented 1 year ago

Aha, I see.

One potential application specific solution for Nima might be to maintain the state of the client connections in a field. When the time comes to stop the server all idle client sockets can be closed immediately, and the currently active sockets can be made to complete their current request and then close.

But understand that you are thinking about whether an alternative solution is possible, where the states of the (virtual) threads can be used to determine which connection that is active and which connection that is idle.

rbygrave commented 1 year ago

One potential application specific solution for Nima might be to ...

Yes, this is coded pretty much as you describe in the associated PR - https://github.com/helidon-io/helidon/pull/5718/files

maintain the state of the client connections in a field

The field is at: https://github.com/helidon-io/helidon/pull/5718/files#diff-c90e0012c4ad88754ff6db529fd9db71a10065dc2f102979e35f2e1983735970R315

When the time comes to stop the server all idle client sockets can be closed immediately

Which is done via https://github.com/helidon-io/helidon/pull/5718/files#diff-c90e0012c4ad88754ff6db529fd9db71a10065dc2f102979e35f2e1983735970R142 ... interrupting the idle Http1Connection threads means they close the connection and complete

thinking about whether an alternative solution is possible

Yes exactly. As I see it the application code is duplicating the features of ThreadPerTaskExecutor simply because ThreadPerTaskExecutor doesn't expose the active Runnables (or active Threads) via public api [but it does expose the active Threads for internal jdk use]. If ThreadPerTaskExecutor publicly exposed active Runnable that would provide a very elegant solution and I think if ThreadPerTaskExecutor made the api to expose the active Threads public to application code we could probably make that work ok.

Another sounds-a-bit-crazy option is for Helidon to use it's own version of ThreadPerTaskExecutor such that we don't double up on the holding / adding / removing of Http1Connection runnables which is what the code does at the moment.

rbygrave commented 1 year ago

FYI I have a discussion at loom-dev that outlines this issue. I have raised the issue there because I'm thinking that ideally this issue is solved by changes to java.util.concurrent.ThreadPerTaskExecutor and I think this is going to be a common issue/pattern with Virtual Threads waiting on IO but application wise deemed idle (which is our Helidon Http1Connection case).

I've described those desired changes as I see them on loom-dev.

I have locally a modified version of ThreadPerTaskExecutor that solves this issue in imo a much more elegant way than what we can do via application code and I'm thinking it would be good to push that up so that people can compare the 2 approaches.

rbygrave commented 1 year ago

Pushed a commit to https://github.com/helidon-io/helidon/pull/5748/files that changes to use a modified/hacked ThreadPerTaskExecutor.

The main part of this change is at: https://github.com/helidon-io/helidon/pull/5748/files#diff-7e338ad5380cf95ade56f55831012583e5e99387c505ebc56260f530bd3371ccR97-R110

rbygrave commented 1 year ago

From loom-dev Ron:

I am not familiar with Helidon’s internals or with the details of the particular issue, but you can separate the “idle” and “busy” threads by spawning them in different ExecutorServices. For example, threads that wait an accept new requests are spawned in one ES and they, in turn, spawn a “request processing” thread in another. To shut down the server you will then forcefully shut down the first ES, stopping new requests from being accepted, and gracefully shut down the second, giving the requests some time to finish. — Ron

My interpretation of Ron's point here is that ... For Helidon HTTP 1.1 keepalive true connections, currently the same thread is being used for the http1prologue.readPrologue() "waiting for the next request for this connection" as well as "doing the processing of the request"

If these were in fact 2 separate Threads in 2 separate ExecutorServices then that would enable a more forceful shutdown of the threads that are waiting at http1prologue.readPrologue() and allow a nice graceful shutdown of threads that are processing a in-flight requests (the work after http1prologue.readPrologue()) because they are 2 separate ExecutorServices.

Maybe something like:

Split the readerExecutor ExecutorService in ServerListener into 2 ExecutorService like:

ExecutorService readerExecutor; // waits on readPrologue() and then hands off work to  workerExecutor
ExecutorService workerExecutor; // given the prologue processes the request

Change the while(true) loop in Http1Connection.handle() into a chain of execution where a workerExecutor thread after processing a request submits the connection to readerExecutor to waiting for the next readPrologue(). After a readerExecutor has readPrologue() it submits to the workerExecutor to process the request.

spericas commented 1 year ago

Pushed a commit to https://github.com/helidon-io/helidon/pull/5748/files that changes to use a modified/hacked ThreadPerTaskExecutor.

The main part of this change is at: https://github.com/helidon-io/helidon/pull/5748/files#diff-7e338ad5380cf95ade56f55831012583e5e99387c505ebc56260f530bd3371ccR97-R110

Thanks for the detailed description and analysis of the shutdown issue. Only just going over this issue and the proposed PR. A modified executor solution isn't ideal IMO, especially if it duplicates a lot of the JDK library code. Have you explored a solution where the ThreadFactory passed to the executor is wrapped/extended to help interrupt threads in that special state? Not sure if this works, just a thought.

rbygrave commented 1 year ago

Have you explored a solution where the ThreadFactory passed to the executor is wrapped/extended to help interrupt threads

I have not. See my thinking below ... TLDR: I don't think it helps.

A modified executor solution isn't ideal IMO, especially if it duplicates a lot of the JDK library code.

Agreed, not ideal. Still not crazy IF it helps to prompt some discussion and change in JDK library code to assist our case here ... and maybe not crazy IF it's the difference between beating Netty or not and IF we only need to maintain it for a year or so.

shutdown issue

FWIW as I see it, shutdown is a one-off-task and in improving this we are looking at options that incur extra per-request cost. As such it's likely that Nima is especially sensitive to any extra per-request costs as we often see millions of requests per the one-off shutdown task.

Said another way, we want nice elegant solutions but Nima also likely has very high performance goals in play.

Approx costs of options looked at so far:

  1. Modified ThreadPerTaskExecutor. Adds 2 volatile boolean writes per request. Looks like cheapest option by far.
  2. Keep track of "Idle" Http1Connection via a new ConcurrentHashMap. Adds map.put() + map.remove() per request.
  3. Use a separate ExecutorService for http1prologue.readPrologue(). Effectively adds 2 times map.put() + map.remove() + some extra overhead per request (the underlying datastructure is a ConcurrentHashMap). We might also in practice hit extra costs here in the form of cpu cache misses with the readPrologue() potentially on a different core to the rest of processing of the request.

So option 3 is what seems to be suggested by the likes of Ron but at first glance looks like the most expensive option for Nima to adopt with some unknown extra costs that would ideally be measured but take effort to assess properly.

We are doing this all because JDK ThreadPerTaskExectutor only exposes its active underlying virtual threads internally to other JDK code (as I see it).

ThreadFactory passed to the executor is wrapped/extended

I don't see how ThreadFactory helps in that it is unaware of the removal of threads at task completion. At shutdown we ideally get access to the threads and associated task left in ThreadPerTaskExecutor. The ThreadPerTaskExecutor adds and removes Threads and the ThreadFactory isn't aware of the removals.

Statistics of active requests

Another alternative approach is to maintain AtomicLong counter(s) that hold the "number of active requests" (and on shutdown wait until active requests is zero before shutting down the ES). At this stage I don't think this is better than the other 3 options I've looked at.

spericas commented 1 year ago

Approx costs of options looked at so far:

  1. Modified ThreadPerTaskExecutor. Adds 2 volatile boolean writes per request. Looks like cheapest option by far.
  2. Keep track of "Idle" Http1Connection via a new ConcurrentHashMap. Adds map.put() + map.remove() per request.
  3. Use a separate ExecutorService for http1prologue.readPrologue(). Effectively adds 2 times map.put() + map.remove() + some extra overhead per request (the underlying datastructure is a ConcurrentHashMap). We might also in practice hit extra costs here in the form of cpu cache misses with the readPrologue() potentially on a different core to the rest of processing of the request.

So option 3 is what seems to be suggested by the likes of Ron but at first glance looks like the most expensive option for Nima to adopt with some unknown extra costs that would ideally be measured but take effort to assess properly.

Option 3 does seem to be the cleanest. It would be interesting to measure its cost.

ThreadFactory passed to the executor is wrapped/extended

I don't see how ThreadFactory helps in that it is unaware of the removal of threads at task completion. At shutdown we ideally get access to the threads and associated task left in ThreadPerTaskExecutor. The ThreadPerTaskExecutor adds and removes Threads and the ThreadFactory isn't aware of the removals.

I was thinking about doing additional wrapping of input runnables/callables and look for their completion, but this does not work because the ThreadPerTaskExecutor does wrapping of those already, so there's no easy way (or no way) to correlate them with the threads in the factory.

In any case, as you stated, a solution to the problem available from the JDK would be ideal, such as the interface you proposed.

rbygrave commented 1 year ago

Option 3 does seem to be the cleanest

Well it's all in the eye of the beholder of course but for me I had an attempt at doing Option 3 and I personally didn't actually like the result and for me I much preferred the existing code in terms of readability and understanding. Hopefully someone else does a better job of it :)