typelevel / cats-effect

The pure asynchronous runtime for Scala
https://typelevel.org/cats-effect/
Apache License 2.0
2.01k stars 515 forks source link

Virtual Threads (Project Loom) in IO.blocking / IO.interruptible #3869

Closed alexandru closed 10 months ago

alexandru commented 11 months ago

Hello,

Right now, Cats-Effect uses a cached thread pool for blocking I/O, used in operations like IO.blocking, IO.interruptible and IO.interruptibleMany. It would be great if, when Cats-Effect ran on top of Java 21, that thread pool would switch to usage of virtual threads. Personally, I'd prefer for this to just work out of the box. Users that upgrade to Java 21 should just get the benefits without configuring IOApp.

The biggest win isn't performance, but correctness. Some limitations are lifted for virtual threads, such as network socket operations becoming interruptible. So for example, when we have a JDBC query executed via IO.interruptible, the interruption will just work. Read the details from here: https://openjdk.org/jeps/444#Networking

The challenge is in creating an implementation that is able to use virtual threads when available. Currently, Cats-Effect is not compiled for JDK 21, and probably won't be for some time. Therefore, a solution would have to either:

  1. use the reflection API to call Thread.ofVirtual when available (e.g., MethodHandle), otherwise fallback on a cached thread-pool, or;
  2. create a micro-library that builds a multi-release JAR, but it's not a widely used approach; libraries like CE most often choose reflection.

What do you think? I can follow up with a PR if you agree to try it out.

armanbilge commented 11 months ago
  1. use the reflection API to call Thread.ofVirtual when available

Actually, we should just use reflection to call [Executors.newVirtualThreadPerTaskExecutor()](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/Executors.html#newVirtualThreadPerTaskExecutor()) and set that as the blocking pool in IORuntime. That way we don't need to mess about with MethodHandles, it's a one-time setup to get an ExecutionContext backed by VirtualThreads.

I think we will also need to expose a new configuration option to force IOFiber to use the blocking pool in IORuntime because currently it will always use the WSTP's internal blocking mechanism when possible. That logic is here:

https://github.com/typelevel/cats-effect/blob/f8137b7ee7bbc6b0c510b8da20629edfb6922463/core/shared/src/main/scala/cats/effect/IOFiber.scala#L978-L986


The biggest win isn't performance, but correctness. Some limitations are lifted for virtual threads, such as network socket operations becoming interruptible. So for example, when we have a JDBC query executed via IO.interruptible, the interruption will just work.

Thanks for emphasizing this point, I didn't realize that, it's quite interesting.

The JEP you linked says:

Blocking I/O operations on these types of sockets when obtained from an InterruptibleChannel have always been interruptible, so this change aligns the behavior of these APIs when created with their constructors with their behavior when obtained from a channel.

It sounds to me like interruption is already working forjava.nio.channels networking APIs, although I guess it was not working for their older java.net counterparts. Is JDBC really using the older, non-interruptible APIs? Are there any other motivating libraries to consider, besides JDBC? It sounds like the best way to fix this for everyone (Loom or no Loom) would be for those libraries to switch to obtaining the java.net APIs via the java.nio.channels constructors.


There is also the problem that we would also need downstream libraries to use IO.interruptible(...) instead of IO.blocking(...) (for example at a glance Doobie appears to be using blocking everywhere).

One idea would be to also transform blocking(...) to interruptible(...) in VirtualThread mode, but this would be a semantic change and seems like a bad idea.

So in that case, we genuinely need downstream libraries to switch to interruptible(...), but probably they would only want to do that on JDK 21+ since otherwise it adds pointless overhead over blocking(...). By that point, I wonder if the library should just do something like evalOnExecutor(newVirtualThreadPerTaskExecutor()).


In the end, I wonder if the performance argument is more compelling. If VirtualThreads can transform a blocking API like JDBC into a non-blocking one, that seems like a win for Doobie.

armanbilge commented 11 months ago

Cross-linking to:

alexandru commented 11 months ago

It sounds to me like interruption is already working forjava.nio.channels networking APIs, although I guess it was not working for their older java.net counterparts. Is JDBC really using the older, non-interruptible APIs? Are there any other motivating libraries to consider, besides JDBC? It sounds like the best way to fix this for everyone (Loom or no Loom) would be for those libraries to switch to obtaining the java.net APIs via the java.nio.channels constructors.

On JDBC, definitely yes, I did a recent experiment, here's a script that can be executed directly for testing: https://gist.github.com/alexandru/7d5546ebac0ef8a0cac60f5cedfec8e1

For me, this is huge. We can't take the use of Doobie for granted, and for example JDBI3, a very popular Java library, doesn't expose the Statement such that you can't do Statement.cancel. Integrating such libraries with Cats-Effect is very painful, and the problem is that we often have to integrate bigger modules built in Java, making use of such libraries.

Another example I can think of is javax.mail. It actually blocked our process at work when the SMTP server was experiencing issues. NOTE: I haven't actually tested if javax.mail becomes better behaved, but I have high hopes.

We are forced to use a Java library at $work, making use of Apache's HTTP Client (I hope I'm remembering correctly), and it's been a painful experience, as the requests aren't cancelable, and this is the most problematic piece of I/O that we are doing. Another one would be java.net.http.HttpClient, but this is an interesting case, as its Flow API is interruptible (although it doesn't back-pressure). Not sure if all these will become interruptible, but seeing restrictions lifted for virtual threads warms my heart.


On turning IO.blocking into IO.interruptible, I don't think it's a good idea to change semantics, as it can break people's code. Downstream dependencies wishing to make use of interruption only have to switch from blocking to interruptible, which has been there for some time. The issue with Java interruption is that it has to be evaluated on a case by case basis for safety.

In the case Java evolves such that interruption has fewer gotchas (maybe because interrupting virtual threads becomes more common), then this decision can be re-evaluated.


In the end, I wonder if the performance argument is more compelling.

I haven't tested performance. Anecdotally, we don't have performance issues from our use of blocking operations via IO.blocking or IO.interruptible. Depends on use-case, I guess. I am excited about having an easier time integrating with Java libraries.

But sure, the performance angle is probably compelling too.

alexandru commented 11 months ago

@armanbilge thanks for the implementation suggestions. If it's fine with you, I want to do a PR proposal.

armanbilge commented 11 months ago

Thanks for the exposition. Here's what I'm thinking: we actually don't want to replace the blocking pool with VirtualThreads. Rather, we have a new class of operations that need to be handled specially.

So I can't help but feel like we want something like IO.virtual(...) so that a user can specifically designate the tasks that would benefit and should run on a VirtualThread.

There's just the annoying issue of what to do if VirtualThreads are not available. So perhaps it would need to be e.g. IO.virtualOrBlocking(...) that falls back to blocking(...) on older JDKs.

Does that make sense? WDYT?

alexandru commented 11 months ago

I don't like the idea of IO.virtual because virtual threads are an implementation detail as far as Cats-Effect is concerned, and because we should strive to use them automatically, as everyone will want it. I like it when libraries do the right thing with minimal configuration. And this is an ecosystem shift that CE shouldn't miss.

WSTP is cool, but for blocking operations you're not saving much by avoiding a thread shift. The bigger problem with WSTP is that pinning system threads in blocking I/O is still super expensive, no matter what tricks it can pull. This is another issue that Virtual Threads solve, and it's trivial to test.

The following sample fails with an OutOfMemoryError. And even if we configure the JVM and the kernel to support 10,000 threads or more, I'm pretty sure that the context switching adds up and there's nothing that WSTP can do about it:

(0 until 10000).toList
  .parTraverse(_ => IO.interruptible(Thread.sleep(1000)))
  .void

With virtual threads, you could easily create 100,000 threads without issues. See runnable script. In fairness, green threads aren't immune to context switching effects, but it is cheaper, and they did go to great lengths to optimize it (here's a nice presentation about it)

There's obviously a limit to how many things you can do in parallel, and in our $work project we have parallelism limited everywhere we do blocking I/O. Works fine, but I'd rather not have to think about it so much.


If API changes are needed, I think I'd go in the direction of Kotlin:

withContext(Dispatchers.IO) {
  // ...
  // The IO thread-pool dispatcher gets inherited here from the lexical scope
  runInterruptibly { 
     blockingOperation()
  }
}

This isn't unlike the ability I added in Monix for overriding the default thread-pool, inspired by RxJava's executeOn, which made it as shiftOn in CE 2. But what I like about it is that this one feels declarative, yet it allows fine-grained control over how things execute.

What I don't like is that runInterruptibly doesn't run by default on a separate thread-pool, but if you really want the ability to keep things on WSTP, then this would be the way to go.

So the API could look like this:

IO.withContext(BlockingIO) {
  for {
    a <- IO.interruptible(op1)
    b <- IO.interruptible(op2)
    c <- IO.interruptible(op3)
 } yield (a, b)
}

But this is a bigger change, and for now I'd really prefer a cheaper integration.

armanbilge commented 11 months ago

I don't like the idea of IO.virtual because virtual threads are an implementation detail as far as Cats-Effect is concerned, and because we should strive to use them automatically, as everyone will want it.

I'm not sure that I agree that it is an "implementation detail".

The bigger problem with WSTP is that pinning system threads in blocking I/O is still super expensive, no matter what tricks it can pull. This is another issue that Virtual Threads solve, and it's trivial to test.

That's the thing, VirtualThread's don't solve the problem of pinning system threads in blocking I/O. Consider Daniel's words:

Additionally, the abstraction that Loom really pushes you to use, Thread, is quite leaky. It makes promises that it can't keep (particularly around InetAddress, URL, and File). Because of operating systems themselves, it is impossible to implement the contracts of those APIs without hard-blocking physical threads, at least in some cases on some platforms. However, Loom makes the claim that you can and encourages you to rely on this assumption.

IMHO switching out our WSTP's specialized blocking mechanism for VirtualThreads is falling into this trap.

With virtual threads, you could easily create 100,000 threads without issues.

Unfortunately this is simply not true :) sure you can create 100,000 virtual threads doing Thread.sleep(...), but if the task involves file I/O or DNS or even synchronization it breaks down immediately. For example try making this change to your example:

- .parTraverse(_ => virtualThread(Thread.sleep(1000)))
+ .parTraverse(_ => virtualThread((new Object).synchronized(Thread.sleep(1000))))

I assume that you chose that example very specifically because of your knowledge of the implementation details, which is fine :)

My point is that because VirtualThread is a leaky abstraction, that we do need to be aware of implementation details when using it, the same way that we are already mindful of these details when wrapping side-effects in delay, blocking, or interruptible. So-called "virtual" seems to be a new category that some, but not all, ops can benefit from.

alexandru commented 11 months ago

.parTraverse(_ => virtualThread((new Object).synchronized(Thread.sleep(1000))))

This change is unfair because you're targeting a current limitation, but it's a limitation that's not fundamental. You only need to replace that with a ReentrantLock and many libraries did that. Including PostgreSQL's JDBC driver, which I tested above.

Also, it works on top of GraalVM because GraalVM is not pinning platform threads in synchronized blocks. The thing is, whatever limitations virtual threads have, they are only temporary.

Virtual threads, on the JVM, are the future.

UPDATE: as an update on GraalVM's behavior, I couldn't reproduce it in testing with GraalVM 21, with or without native-image, so my claim above may be premature. I saw it mentioned at a conference presentation, maybe it's something that's not shipped.

armanbilge commented 11 months ago

Yep, virtual threads are the future, but it seems to me that in the present there are enough limitations that completely replacing the Cats Effect blocking mechanism with virtual threads would be a (performance) regression. I do recognize that virtual threads are already a huge step forward for some APIs e.g. JDBC so we should make sure they have first-class support.

Of course, I'm curious to hear others' thoughts on this topic :)

wb14123 commented 11 months ago

JDBC is a big use case for virtual thread in my mind. If virtual thread is not mature enough to use by default, this option would be great:

I think we will also need to expose a new configuration option to force IOFiber to use the blocking pool in IORuntime

So that user can try out virtual threads by defining a virtual thread pool as blocking pool and see if it benefits their use case or not.

It's good to provide IO.virtualOrBlocking, but I think the option above is still needed so no code (especially code in third party libraries) need to be changed in order to use virtual threads.

armanbilge commented 11 months ago

so no code (especially code in third party libraries) need to be changed in order to use virtual threads.

@wb14123 see https://github.com/typelevel/cats-effect/pull/3870, it allows blocking code to be run on virtual threads without changing library code.