javalin / javalin

A simple and modern Java and Kotlin web framework
https://javalin.io
Apache License 2.0
7.6k stars 576 forks source link

How does using async handlers reduce the total number of threads needed? #1333

Closed mattwelke closed 2 years ago

mattwelke commented 3 years ago

For some background, I'm pretty new to concurrency in Java. I have experience with Node.js and Go, so I have some familiarity with concurrency and async work in those languages, but I'm noticing differences in Java where you have to do more work such as managing thread pools.

In the docs, the async handlers stood out to me (https://javalin.io/documentation#faq). It looks like a nice way to keep that sequential programming model. As long as I wrap my work in a Future and return it, I activate this "async" behavior. In the docs, a new concept was introduced to me - executors. I then read some Java documentation about executors to learn what this was all about.

So my understanding at this point is that with Java, you can't just ask that the runtime process something in the background like in Node.js if I called setImmediate(<my_func>) which instructs the event loop to process it on the next cycle or in Go if I called go <my_func> which instructs the scheduler to spawn a new goroutine and run it there. Instead, I have to tell it where to run it in the background. That could be new Thread(...) if I were willing to manage threads myself, or an executor if I'd like to work at a higher layer of abstraction and take advantage of a pool of threads being managed for me, where threads are reused so that it's more efficient.

But what I'm confused about at this point is why this improves things in the first place. The docs say that the thread pool created for Jetty defaults to 200 threads. So my understanding is that means I can process up to 200 requests at a time. If requests take a long time, or many come in at once, that would cause problems because these 200 threads might eventually be exhausted and subsequent requests would have to wait for a thread to be free. My understanding is that I have two ways to solve this in Javalin:

  1. By raising the number the number of threads that Jetty uses, for example from 200 to 400, doubling the number of requests I can handle at once, or doubling the length of time my requests could be allowed to take to complete without subsequent requests having to wait for a thread.
  2. By using async requests. I'd set up an executor to schedule the work on. The executor would have as many threads as are needed to process all of the requests, depending on how long each request takes and how many I expect to come in at once. So maybe I would set up a thread pool with 400 threads in this executor.

So if my requests take a while to process, such that in my example here I could get the work done with 400 threads, wouldn't I need to set up an executor with its own thread pool of 400 threads? Wouldn't I end up with an equal number of threads or a greater number of threads provisioned in total, between the Jetty thread pool and my executor's thread pool? And wouldn't this negate the improvement from using async requests?

After reading as much as I could about Javalin, it seems that my understanding is flawed, because the example at https://github.com/tipsy/javalin-async-example demonstrates that using the async requests approach halves the total amount of time the requests in that example take to process (from 15s to 7.5s). And the executor set up in the example in that repo (and in the example in the docs) is obtained by calling Executors.newSingleThreadScheduledExecutor, so it seems that the executor used has just one thread to do all the work, yet it's faster.

tipsy commented 3 years ago

Let me prefix this by saying that I'm not an expert on Java async either, and that I have more experience with this in Node myself.

As long as I wrap my work in a Future and return it, I activate this "async" behavior. In the docs, a new concept was introduced to me - executors. I then read some Java documentation about executors to learn what this was all about.

If you just wrap your work in futures, the main benefit you get is resource management (how many of the threads from the main threadpool an endpoint is allowed to hold). There is no performance benefit compared to just increasing the number of threads, as you mention as an alternative.

Similarly, if you have something that is future based, and you call Future#get, you will end up blocking, and there will be no performance benefit.

To get a performance benefit, you need to uses futures all the way down (like https://ohadshai.medium.com/reactive-java-all-the-way-to-the-database-with-jasync-sql-and-javalin-c982365d7dd2). Futures have callbacks (similar to in JavaScript), and Javalin attaches a callback to write the response once the future has resolved. Attaching callbacks and waiting for futures to resolve does not block threads (thereby reducing the total number of threads needed, and hopefully answering your question).

mattwelke commented 3 years ago

To get a performance benefit, you need to uses futures all the way down (like https://ohadshai.medium.com/reactive-java-all-the-way-to-the-database-with-jasync-sql-and-javalin-c982365d7dd2).

With that example, wouldn't something eventually need to call Future#get to wait for all the async work to finish though? In that example, the handler creates a future that includes work for the SQL library to do asynchronously, and the handler, when invoked by Javalin, returns the future. Wouldn't Jetty call Future#get at which point, the thread pool dedicated to Jetty would have to wait for the work to be completed, and then be limited in terms of how many of these requests can be handled at once by the size of its thread pool?

tipsy commented 3 years ago

Wouldn't Jetty call Future#get at which point, the thread pool dedicated to Jetty would have to wait for the work to be completed, and then be limited in terms of how many of these requests can be handled at once by the size of its thread pool?

No, Jetty isn't aware of the Future. Jetty implements the servlet-specification, which supports async request handling. To enable this you use HttpServletRequest#startAsync, at which point the request is lifted out of the server's threadpool. What exactly Jetty does at this point, I'm not sure of, but you can see what Javalin does here: https://github.com/tipsy/javalin/blob/master/javalin/src/main/java/io/javalin/http/JavalinServlet.kt#L85-L98

Jetty is never made aware of this Future.

Here is the Javadoc from the interface:

    /**
     * Puts this request into asynchronous mode, and initializes its
     * {@link AsyncContext} with the original (unwrapped) ServletRequest
     * and ServletResponse objects.
     *
     * <p>Calling this method will cause committal of the associated
     * response to be delayed until {@link AsyncContext#complete} is
     * called on the returned {@link AsyncContext}, or the asynchronous
     * operation has timed out.
     *
     * <p>Calling {@link AsyncContext#hasOriginalRequestAndResponse()} on
     * the returned AsyncContext will return <code>true</code>. Any filters
     * invoked in the <i>outbound</i> direction after this request was put
     * into asynchronous mode may use this as an indication that any request
     * and/or response wrappers that they added during their <i>inbound</i>
     * invocation need not stay around for the duration of the asynchronous
     * operation, and therefore any of their associated resources may be
     * released.
     *
     * <p>This method clears the list of {@link AsyncListener} instances
     * (if any) that were registered with the AsyncContext returned by the
     * previous call to one of the startAsync methods, after calling each
     * AsyncListener at its {@link AsyncListener#onStartAsync onStartAsync}
     * method.
     *
     * <p>Subsequent invocations of this method, or its overloaded 
     * variant, will return the same AsyncContext instance, reinitialized
     * as appropriate.
     *
     * @return the (re)initialized AsyncContext
     * 
     * @throws IllegalStateException if this request is within the scope of
     * a filter or servlet that does not support asynchronous operations
     * (that is, {@link #isAsyncSupported} returns false),
     * or if this method is called again without any asynchronous dispatch
     * (resulting from one of the {@link AsyncContext#dispatch} methods),
     * is called outside the scope of any such dispatch, or is called again
     * within the scope of the same dispatch, or if the response has
     * already been closed
     *
     * @see AsyncContext#dispatch()
     * @since Servlet 3.0
     */
    public AsyncContext startAsync() throws IllegalStateException;

If you want to know how this works in detail, I guess you have to dive into Jetty's source code.

rbygrave commented 3 years ago

FYI:

Some background on Java concurrency @mattwelke if it helps (because for Java it is changing). TLDR, Java [with Loom] is going the path of golang and NOT going the path of csharp, javascript (langauges that have async/await keywords - have the coloured function issue). "Loom" is the project that adds "Light weight threads" / "Virtual threads" to Java and changes all internal JDK to not block operating system threads (so pretty much goroutines). Slightly "back to the future" for Java because Java prior to 1.3 used green threads.

Brian Goetz (Java Language Architect) on Loom and reactive programming. You will hear a lot about reactive programming in the java space. TLDR Loom makes the reactive style pretty niche. https://www.youtube.com/watch?v=9si7gK94gLo&t=1156

An important article that is referred to a lot when people talk about Java "Loom" (a goroutine style approach) versus other languages like javascript and csharp. In short, all languages that have async/await keywords have the coloured function/method issue. https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/

mattwelke commented 3 years ago

Thanks for the links guys. I've actually been following Project Loom because virtual threads are my preferred approach to this problem so far. I'm looking forward to them being in Java soon. In the meantime, I'm trying to learn what the status quo is and find something that makes the most sense to me, since we'll be putting some Java stuff into production soon.

I'll do some reading up on what Jetty is doing under the hood. That might help me understand what impact returning futures from my handlers has, whether it reduces the total number of threads my program has to have provisioned amongst all the pools it's using, etc.

tipsy commented 3 years ago

@mattwelke it would be great if you could summarize your findings into a paragraph or two that can be added to the docs :)

mattwelke commented 3 years ago

I'll send a PR with a doc change if I feel like there's something to add that will help people with my perspective, where we aren't used to the concepts of thread pools and stuff like that. I need some time for my thoughts to digest though first. I was barely able to put my thoughts together enough to create this issue. I'm glad I seem to have gotten my point across though. xD

Thanks again

rbygrave commented 3 years ago

Cool. It short this depends on how much "Blocking IO" is being used (how much time a thread is blocked waiting for IO response, not on Future itself or foreground/background itself). Loom deals with that. The alternative today wrt dealing with Blocking IO on rest endpoint I'd say is to either:

Note that in reality the cost of Threads in Java is a gray area. The amount of memory a thread takes varies based on the app.

mattwelke commented 3 years ago

Put work that does a lot of slow IO and put it into the background (put it on a queue and process the queue instead, don't do heaps of slow blocking io on the jetty threads)

Is the idea here to avoid needing a high number of threads provisioned at all times by keeping slow IO off the Jetty thread pool and putting it in a queue where it's processed one unit of work at a time? And that this strategy would work for me as long as that added delay for my slow IO work is acceptable? A trade off where my app operates a bit slower, because only one of those slow IO units of work can be processed at a time. but with fewer total threads needed?

Also, re-opening because I'd like to use this to track my doc change PR if I submit one.

rbygrave commented 3 years ago

avoid needing a high number of threads provisioned at all times by keeping slow IO off the Jetty thread pool

In short, yes. We might be better to say take any slow response/work off the jetty thread pool (off the "real time api").

I'm sure you realise this but loom and "reactive" are not magically adding any CPU, it is just a matter of more efficiently using the CPU we have (meaning, if a os/platform thread is blocked waiting on IO then it is less efficient - that os/platform thread could be doing other work). I don't think it is easy to measure how much blocking IO an application has but maybe we could via benchmarking with Loom early access release.

putting it in a queue where it's processed one unit of work at a time

Take 2 extreme examples. An endpoint with sub 1ms response time vs and endpoint with 30sec response time. Regardless of how much blocking IO there is, there is a certain amount of throughput those 2 endpoints can take with a given max threadpool size. If we get a burst of requests for the endpoint with the 30 second response time we are going to think about how to deal with that (put in a request limiter, increase thread pool, increase number of pods/instances, move the load into the background via a queue).

That is, if we take slow load and put it into a background (via a queue) we have increased the ability of the application to handle bursts in load (to the real time api / rest endpoints) and we have smoothed the load on something that is known to be relatively expensive (e.g. smoothed the load that includes expensive database queries). Regardless of whether the slow endpoint has a lot of blocking IO or not we might do this because of the load smoothing benefits [akin to "reactive backpressure"].

The question becomes how slow is too slow for real time API's and that can depend on how much the load to those slow endpoints bursts. For example, do we have K8s auto-scaling, or bump the max thread pool to 300 and live with it for now, or move this load onto non-real time processing (queues) or if we think it's slow because of a lot of blocking IO change the endpoint to use "reactive style" [or try loom early access release].

Notes about JDBC:

The other thing to note is that fundamentally JDBC is a blocking API, so if we are talking to a RDBMS and we don't want threads to block we either have to go to another database client library like R2DBC (newer, less battle tested, not faster, will have it's own quirks). If we are talking to non-RDBMS, something like like Mongo - different story, reactive drivers the norm there.

For me, I see issues and risks going to reactive database client libraries like R2DBC and away from JDBC drivers because JDBC is arguably faster (it benchmarks as faster) and better supported plus things like ThreadLocal don't work (@Transactional etc) once we go to reactive style.

rbygrave commented 3 years ago

high number of threads provisioned at all times by

Just to say the thread pool with grow and shrink. It is more a question of the max thread pool size. What do we want to provision the max thread pool size to be (and if we increase the max we similarly increase the max memory consumed by the app).

I'm conservative. To me today the cost effective / cheap approach is to increase the max thread pool if really necessary, use a bit more memory and in doing so keeping things simple knowing that when Loom arrives we get that memory back.