opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.44k stars 1.73k forks source link

[Performance Proposal] Virtual Thread Support for heavy I/O bound ops #10786

Open nknize opened 10 months ago

nknize commented 10 months ago

JDK19 released JEP-425 (Virtual Threads) as an --enable-preview feature which improves thread scalability "...by mapping a large number of virtual threads to a small number of OS threads.". The feature is ideal for heavy I/O bound operations eliminating the need for the thread-sharing asynchronous programming style where a heavy I/O bound thread is returned to the pool and only signals its completion as a callback to the calling thread. This async design is all over the OpenSearch codebase.

This enhancement proposal is to explore performance improvements (or code simplification) by using virtual threads on those heavy I/O bound operations (e.g., segment copies). One option is to initially Feature Flag virtual thread support as an option to ScalingExecutorBuilder that can be enabled by the user as a cluster setting. Once proved out through benchmarks we can turn vThreads on by default for runtimes using JDK 19+ (or just bump the min targetCompatibility and sourceCompatibility to JavaVersion.VERSION_19 to guarantee feature availability).

Note this is NOT suggesting using virtual threads for optimizing Lucene IW / IR workloads (as mentioned here) but to explore using them in the distributed design around the lucene internals (e.g., storage transfers)

andrross commented 10 months ago

...and only signals its completion as a callback to the calling thread. This async design is all over the OpenSearch codebase.

I think this async pattern is one of the main causes of complexity of the OpenSearch code base and is likely a big hurdle for new folks to get over when contributing to the core. Virtual threads are super intriguing to me as a way to simplify the code and make it easier to configure to optimally use resources.

The newly introduced functionality to download segments from the remote store during replication and recovery seems like a good place to experiment with virtual threads. It is pretty much all I/O bound.

ansjcy commented 9 months ago

A side note:

Currently, the Performance Analyzer plugin collects certain JVM performance data in OpenSearch by taking thread dumps and also directly reading from the thread files under procfs (/proc/<opensearch pid>/task/*), we need to involve the PA/RCA repo maintainers in this effort to understand the impact on their plugin as well, otherwise any components depending on PA/RCA metrics would be impacted.

sohami commented 9 months ago

The newly introduced functionality to download segments from the remote store during replication and recovery seems like a good place to experiment with virtual threads. It is pretty much all I/O bound.

Similar to this Snapshot layer can be another area where it can be useful.

andrross commented 8 months ago

I did some very brief experimenting here, and once I got the compilation issues sorted out I ran into an immediate problem with the security manager. I was just doing some simple file I/O where it would work on a platform thread (using a thread from an executor in org.opensearch.threadpool.ThreadPool) but fails on a virtual thread. I haven't dug into this at all, but I don't think the design of virtual threads plays well with the security manager (which is on the path for removal) as it may require setting up the security context on each new virtual thread.

Update: per JEP444 "Virtual threads have no permissions when running with a SecurityManager set" which I believe is what I encountered here

reta commented 8 months ago

@andrross this could be relevant https://github.com/opensearch-project/OpenSearch/issues/1687#issuecomment-1621913881

andrross commented 8 months ago

@andrross this could be relevant #1687 (comment)

@reta Indeed. Virtual threads run with no permissions. It can be made to work by wrapping the code in AccessController.doPrivileged but that may well partially defeat the purpose of virtual threads, which are intended to be very lightweight.

Bukhtawar commented 8 months ago

An important thing to note is, Netty doesn't have a support yet. it would be interesting to see Netty adapt Virtual threads https://github.com/netty/netty/issues/12848 which is used heavily at the transport layer for Network IO.

Some public benchmarking results https://github.com/ebarlas/project-loom-comparison/blob/main/docs/comparison-plots.png is indeed impressive especially at scale

A note of caution: We need to be careful with using V threads when dealing with native dependencies that have a tendency of playing with memory addresses ad V threads would be swapping stacks on platform threads and would break underlying constructs

andrross commented 8 months ago

@Bukhtawar My understanding is that under the hood virtual threads end up doing something very similar to Netty (I've found this doc to be helpful) but it is the programming model that is very different. What exactly does it mean for Netty to offer support? My understanding is that you probably would not want to use Netty's asynchronous API along with virtual threads.

To me, one of the most obvious places in OpenSearch to experiment with virtual threads is anything that uses the repository interface (snaphots, remote store) as that interface is mostly synchronous and we might be able to get big performance improvements without having to rewrite the code to be async. The transport layer is fully asynchronous and is likely a much bigger change/refactoring to use virtual threads.