kohlschutter / junixsocket

Unix Domain Sockets in Java 7 and newer (AF_UNIX), AF_TIPC, AF_VSOCK, and more
Apache License 2.0
438 stars 114 forks source link

[Loom] Support VirtualThreads #157

Closed kohlschuetter closed 4 months ago

kohlschuetter commented 6 months ago

VirtualThreads may block.

From https://github.com/kohlschutter/junixsocket/discussions/155, @cenodis writes:

Hi, I have been trying to figure out to what extent junixsocket supports virtual threads which were stabilized in Java 21 (JEP-444). Unfortunately neither the javadoc nor website make any mention of this feature or how, if at all, it behaves together with junixsocket.

Specifically I have two questions:

  1. Will a call to a blocking I/O method on a junix socket pin the underlying carrier thread?
  2. To better support task cancellation with virtual threads the blocking methods on the socket API (streams, connect, accept) have been amended to support interrupts:

    This method is [interruptible](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#interrupt()) in the following circumstances: [...]

    1. The socket uses the system-default socket implementation and a [virtual thread](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#isVirtual()) is accepting a connection. In that case, interrupting the virtual thread will cause it to wakeup and close the socket. This method will then throw SocketException with the interrupt status set.

https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/net/Socket.html

junisocket is, of course, not the default socket implementation. But it does offer "fake" sockets that behave like normal IP sockets except they pass their data through a unix socket instead. Does junixsocket offer similar interrupt functionality in those sockets?

I have done some experimentation with both SocketChannel and AFUnixSocket to try and provoke possible issues with pinning and have written it as a "test" class here: Make sure to run this code with a single carrier thread (-Djdk.virtualThreadScheduler.maxPoolSize=1). Otherwise the pool size is set to the amount of cores on your machine, allowing both virtual threads to make progress by consuming a platform thread each. https://gist.github.com/cenodis/57c5239a879821211ec0b9ca9aa4f863

It does little more than listen on a unix socket in one (virtual) thread and read from said socket in another. But switching between SocketChannel and Junixsocket shows a clear difference in behaviour:

  • SocketChannel allows the virtual thread to be unmounted from the carrier once it blocks on accept(). The carrier can them make progress by running the client thread until it blocks on connect() at which point it jumps back into the server thread to accept the connection and so on. This way both threads can make progress despite only having a single platform thread available.
  • Junixsocket only manages to create the server socket and call accept(). At this point the carrier appears to be pinned and the application locks up entirely. The client never runs until the main thread exits and destroys the process.

As for -Djdk.tracePinnedThreads, I have seen conflicting statements regarding this property with some saying it only works with pinning due to synchronized blocks. Despite the code above being obviously blocked for multiple seconds I have been unable to provoke a trace even with this option set to full. So either I'm doing something wrong or it doesn't work for JNI calls.

Unfortunately I have not yet found a way to make this class into a proper unit test. Outside of the jdk property there doesn't seem to be a way to ensure that both virtual threads are scheduled to the same carrier. All the ways to control either the scheduler or the virtual thread I have found are restricted as JDK interals. Another option might be to exhaust all other carriers on purpose to ensure deterministic assignment. But this also feels a bit too hacky and unstable to write a proper test against.

Regarding possible solutions on the side of junixsocket:

The "proper" fix is probably to do the same thing that Java did with its Socket: Implement the blocking semantics on top of non-blocking OS calls. Socket has been reimplemented to use SocketChannel in Java 16 so there is now only one implementation. I have done some reading on how SocketChannel handles virtual threads and, on a high level, it works like this:

  1. Test if the current thread is virtual and if so "force" the channel into non-blocking mode. (SocketChannel::configureSocketNonBlockingIfVirtualThread)
  2. Perform the, now non-blocking, IO call. Handing the task to whatever async process the runtime uses. (probably polling based but I have seen io-uring being thrown around in some loom docs so this might change)
  3. Use LockSupport::park to block the current Thread, freeing the underlying carrier.
  4. Once the call is completed, unpark the Thread, allowing another carrier to pick it up again and continue execution.

Now I have no experience with JNI so I don't know how much effort this change would be in this project. As a simpler, but less ideal fix one could also use a pool of platform threads to do the blocking:

  1. Create a pool of platform threads.
  2. When performing a blocking call inside a virtual Thread, push the native call onto the pool and use LockSupport::park or Future::get to block and free the carrier.
  3. Once the native call is complete have the worker Thread unpark the virtual Thread and return it to the pool.
  4. The virtual Thread can now be mounted onto another carrier and continue as normal.

This is less ideal because it does not fix the resource issue (you are still consuming a platform Thread for each native call). But it should fix the excessive carrier consumption and avoid interfering with other virtual threads.

kohlschuetter commented 6 months ago

The latest 2.10.0-SNAPSHOT should remedy this situation. Threads should no longer block each other if both the client and server run on VirtualThreads, and the test code provided seems to work now.

What's still missing is an optimized version of poll for systems that support kqueue, epoll, /dev/poll, etc. Right now, we simply move the poll to a separate non-virtual Thread, which is suboptimal, but does indeed work.

@cenodis Please verify with the latest 2.10.0-SNAPSHOT.

cenodis commented 6 months ago

Works great on my (linux) machine. Junixsocket no longer causes the program to lock up. I have also put it through a few random scenarios and can't detect any pinning, although it's not an exhaustive test by any means.

As you said, optimizations are still possible. But junixsocket should no longer negatively affect other virtual threads running on the system. So at this point I would say virtual threads are properly supported and anything else is an internal detail of junixsocket.

That being said there is still a divergence in interrupt behaviour. In my opinion this is not directly related to supporting virtual threads but it's still a fairly important detail, especially with the structured concurrency (preview) feature bringing a bigger emphasis on interruptions. I am going to collect a bit more information on this topic and then open a seperate issue for it.

urbanchef commented 5 months ago

@kohlschuetter Hi, any plans to release 2.10 with the fix soon?

kohlschuetter commented 4 months ago

@urbanchef Should be ready in a few days, sorting out a couple of remaining issues first.

Please try the latest SNAPSHOT version and let me know if you find any problems with it. Thanks!

kohlschuetter commented 4 months ago

junixsocket 2.10.0 has been released. Please verify and re-open if necessary. Thanks again for reporting , @cenodis !