Closed alexandru closed 11 months ago
- use the reflection API to call
Thread.ofVirtual
when available
Actually, we should just use reflection to call [Executors.newVirtualThreadPerTaskExecutor()
](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/Executors.html#newVirtualThreadPerTaskExecutor()) and set that as the blocking pool in IORuntime
. That way we don't need to mess about with MethodHandle
s, it's a one-time setup to get an ExecutionContext
backed by VirtualThread
s.
I think we will also need to expose a new configuration option to force IOFiber
to use the blocking pool in IORuntime
because currently it will always use the WSTP's internal blocking mechanism when possible. That logic is here:
The biggest win isn't performance, but correctness. Some limitations are lifted for virtual threads, such as network socket operations becoming interruptible. So for example, when we have a JDBC query executed via
IO.interruptible
, the interruption will just work.
Thanks for emphasizing this point, I didn't realize that, it's quite interesting.
The JEP you linked says:
Blocking I/O operations on these types of sockets when obtained from an InterruptibleChannel have always been interruptible, so this change aligns the behavior of these APIs when created with their constructors with their behavior when obtained from a channel.
It sounds to me like interruption is already working forjava.nio.channels
networking APIs, although I guess it was not working for their older java.net
counterparts. Is JDBC really using the older, non-interruptible APIs? Are there any other motivating libraries to consider, besides JDBC? It sounds like the best way to fix this for everyone (Loom or no Loom) would be for those libraries to switch to obtaining the java.net
APIs via the java.nio.channels
constructors.
There is also the problem that we would also need downstream libraries to use IO.interruptible(...)
instead of IO.blocking(...)
(for example at a glance Doobie appears to be using blocking
everywhere).
One idea would be to also transform blocking(...)
to interruptible(...)
in VirtualThread
mode, but this would be a semantic change and seems like a bad idea.
So in that case, we genuinely need downstream libraries to switch to interruptible(...)
, but probably they would only want to do that on JDK 21+ since otherwise it adds pointless overhead over blocking(...)
. By that point, I wonder if the library should just do something like evalOnExecutor(newVirtualThreadPerTaskExecutor())
.
In the end, I wonder if the performance argument is more compelling. If VirtualThread
s can transform a blocking API like JDBC into a non-blocking one, that seems like a win for Doobie.
Cross-linking to:
It sounds to me like interruption is already working forjava.nio.channels networking APIs, although I guess it was not working for their older java.net counterparts. Is JDBC really using the older, non-interruptible APIs? Are there any other motivating libraries to consider, besides JDBC? It sounds like the best way to fix this for everyone (Loom or no Loom) would be for those libraries to switch to obtaining the java.net APIs via the java.nio.channels constructors.
On JDBC, definitely yes, I did a recent experiment, here's a script that can be executed directly for testing: https://gist.github.com/alexandru/7d5546ebac0ef8a0cac60f5cedfec8e1
For me, this is huge. We can't take the use of Doobie for granted, and for example JDBI3, a very popular Java library, doesn't expose the Statement
such that you can't do Statement.cancel
. Integrating such libraries with Cats-Effect is very painful, and the problem is that we often have to integrate bigger modules built in Java, making use of such libraries.
Another example I can think of is javax.mail
. It actually blocked our process at work when the SMTP server was experiencing issues. NOTE: I haven't actually tested if javax.mail
becomes better behaved, but I have high hopes.
We are forced to use a Java library at $work, making use of Apache's HTTP Client (I hope I'm remembering correctly), and it's been a painful experience, as the requests aren't cancelable, and this is the most problematic piece of I/O that we are doing. Another one would be java.net.http.HttpClient
, but this is an interesting case, as its Flow
API is interruptible (although it doesn't back-pressure). Not sure if all these will become interruptible, but seeing restrictions lifted for virtual threads warms my heart.
On turning IO.blocking
into IO.interruptible
, I don't think it's a good idea to change semantics, as it can break people's code. Downstream dependencies wishing to make use of interruption only have to switch from blocking
to interruptible
, which has been there for some time. The issue with Java interruption is that it has to be evaluated on a case by case basis for safety.
In the case Java evolves such that interruption has fewer gotchas (maybe because interrupting virtual threads becomes more common), then this decision can be re-evaluated.
In the end, I wonder if the performance argument is more compelling.
I haven't tested performance. Anecdotally, we don't have performance issues from our use of blocking operations via IO.blocking
or IO.interruptible
. Depends on use-case, I guess. I am excited about having an easier time integrating with Java libraries.
But sure, the performance angle is probably compelling too.
@armanbilge thanks for the implementation suggestions. If it's fine with you, I want to do a PR proposal.
Thanks for the exposition. Here's what I'm thinking: we actually don't want to replace the blocking pool with VirtualThread
s. Rather, we have a new class of operations that need to be handled specially.
for operations that are non-interruptibly blocking, regardless of the thread type (I think file I/O falls in this category), we should continue to use the WSTP's internal blocking mechanism because this is optimized to minimize thread shifts. Using an external pool for blocking can result in a lot more shifting.
for operations that are interruptibly blocking, regardless of the thread type, ideally we would continue to use the WSTP's internal blocking mechanism. It can avoid a thread shift and allows to have a singular thread cache that is shared with other, non-interruptible blocking tasks.
finally, there are operations that when run on JDK 21 with VirtualThread
become non-blocking and interruptible. Those are two compelling reasons to run that task on a VirtualThread
that hopefully outweigh the cost of a second thread pool and shifting to/from it.
So I can't help but feel like we want something like IO.virtual(...)
so that a user can specifically designate the tasks that would benefit and should run on a VirtualThread
.
There's just the annoying issue of what to do if VirtualThread
s are not available. So perhaps it would need to be e.g. IO.virtualOrBlocking(...)
that falls back to blocking(...)
on older JDKs.
Does that make sense? WDYT?
I don't like the idea of IO.virtual
because virtual threads are an implementation detail as far as Cats-Effect is concerned, and because we should strive to use them automatically, as everyone will want it. I like it when libraries do the right thing with minimal configuration. And this is an ecosystem shift that CE shouldn't miss.
WSTP is cool, but for blocking operations you're not saving much by avoiding a thread shift. The bigger problem with WSTP is that pinning system threads in blocking I/O is still super expensive, no matter what tricks it can pull. This is another issue that Virtual Threads solve, and it's trivial to test.
The following sample fails with an OutOfMemoryError
. And even if we configure the JVM and the kernel to support 10,000 threads or more, I'm pretty sure that the context switching adds up and there's nothing that WSTP can do about it:
(0 until 10000).toList
.parTraverse(_ => IO.interruptible(Thread.sleep(1000)))
.void
With virtual threads, you could easily create 100,000 threads without issues. See runnable script. In fairness, green threads aren't immune to context switching effects, but it is cheaper, and they did go to great lengths to optimize it (here's a nice presentation about it)
There's obviously a limit to how many things you can do in parallel, and in our $work project we have parallelism limited everywhere we do blocking I/O. Works fine, but I'd rather not have to think about it so much.
If API changes are needed, I think I'd go in the direction of Kotlin:
withContext(Dispatchers.IO) {
// ...
// The IO thread-pool dispatcher gets inherited here from the lexical scope
runInterruptibly {
blockingOperation()
}
}
This isn't unlike the ability I added in Monix for overriding the default thread-pool, inspired by RxJava's executeOn
, which made it as shiftOn
in CE 2. But what I like about it is that this one feels declarative, yet it allows fine-grained control over how things execute.
What I don't like is that runInterruptibly
doesn't run by default on a separate thread-pool, but if you really want the ability to keep things on WSTP, then this would be the way to go.
So the API could look like this:
IO.withContext(BlockingIO) {
for {
a <- IO.interruptible(op1)
b <- IO.interruptible(op2)
c <- IO.interruptible(op3)
} yield (a, b)
}
But this is a bigger change, and for now I'd really prefer a cheaper integration.
I don't like the idea of
IO.virtual
because virtual threads are an implementation detail as far as Cats-Effect is concerned, and because we should strive to use them automatically, as everyone will want it.
I'm not sure that I agree that it is an "implementation detail".
The bigger problem with WSTP is that pinning system threads in blocking I/O is still super expensive, no matter what tricks it can pull. This is another issue that Virtual Threads solve, and it's trivial to test.
That's the thing, VirtualThread
's don't solve the problem of pinning system threads in blocking I/O. Consider Daniel's words:
Additionally, the abstraction that Loom really pushes you to use,
Thread
, is quite leaky. It makes promises that it can't keep (particularly aroundInetAddress
,URL
, andFile
). Because of operating systems themselves, it is impossible to implement the contracts of those APIs without hard-blocking physical threads, at least in some cases on some platforms. However, Loom makes the claim that you can and encourages you to rely on this assumption.
IMHO switching out our WSTP's specialized blocking mechanism for VirtualThread
s is falling into this trap.
With virtual threads, you could easily create 100,000 threads without issues.
Unfortunately this is simply not true :) sure you can create 100,000 virtual threads doing Thread.sleep(...)
, but if the task involves file I/O or DNS or even synchronization it breaks down immediately. For example try making this change to your example:
- .parTraverse(_ => virtualThread(Thread.sleep(1000)))
+ .parTraverse(_ => virtualThread((new Object).synchronized(Thread.sleep(1000))))
I assume that you chose that example very specifically because of your knowledge of the implementation details, which is fine :)
My point is that because VirtualThread
is a leaky abstraction, that we do need to be aware of implementation details when using it, the same way that we are already mindful of these details when wrapping side-effects in delay
, blocking
, or interruptible
. So-called "virtual
" seems to be a new category that some, but not all, ops can benefit from.
.parTraverse(_ => virtualThread((new Object).synchronized(Thread.sleep(1000))))
This change is unfair because you're targeting a current limitation, but it's a limitation that's not fundamental. You only need to replace that with a ReentrantLock and many libraries did that. Including PostgreSQL's JDBC driver, which I tested above.
Also, it works on top of GraalVM because GraalVM is not pinning platform threads in synchronized blocks. The thing is, whatever limitations virtual threads have, they are only temporary.
Virtual threads, on the JVM, are the future.
UPDATE: as an update on GraalVM's behavior, I couldn't reproduce it in testing with GraalVM 21, with or without native-image, so my claim above may be premature. I saw it mentioned at a conference presentation, maybe it's something that's not shipped.
Yep, virtual threads are the future, but it seems to me that in the present there are enough limitations that completely replacing the Cats Effect blocking mechanism with virtual threads would be a (performance) regression. I do recognize that virtual threads are already a huge step forward for some APIs e.g. JDBC so we should make sure they have first-class support.
Of course, I'm curious to hear others' thoughts on this topic :)
JDBC is a big use case for virtual thread in my mind. If virtual thread is not mature enough to use by default, this option would be great:
I think we will also need to expose a new configuration option to force IOFiber to use the blocking pool in IORuntime
So that user can try out virtual threads by defining a virtual thread pool as blocking pool and see if it benefits their use case or not.
It's good to provide IO.virtualOrBlocking
, but I think the option above is still needed so no code (especially code in third party libraries) need to be changed in order to use virtual threads.
so no code (especially code in third party libraries) need to be changed in order to use virtual threads.
@wb14123 see https://github.com/typelevel/cats-effect/pull/3870, it allows blocking code to be run on virtual threads without changing library code.
Hello,
Right now, Cats-Effect uses a cached thread pool for blocking I/O, used in operations like
IO.blocking
,IO.interruptible
andIO.interruptibleMany
. It would be great if, when Cats-Effect ran on top of Java 21, that thread pool would switch to usage of virtual threads. Personally, I'd prefer for this to just work out of the box. Users that upgrade to Java 21 should just get the benefits without configuringIOApp
.The biggest win isn't performance, but correctness. Some limitations are lifted for virtual threads, such as network socket operations becoming interruptible. So for example, when we have a JDBC query executed via
IO.interruptible
, the interruption will just work. Read the details from here: https://openjdk.org/jeps/444#NetworkingThe challenge is in creating an implementation that is able to use virtual threads when available. Currently, Cats-Effect is not compiled for JDK 21, and probably won't be for some time. Therefore, a solution would have to either:
Thread.ofVirtual
when available (e.g., MethodHandle), otherwise fallback on a cached thread-pool, or;What do you think? I can follow up with a PR if you agree to try it out.