opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.62k stars 1.77k forks source link

Try out new MMapDirectory implementation in Lucene using Panama (FFM) #10552

Open austintlee opened 12 months ago

austintlee commented 12 months ago

Is your feature request related to a problem? Please describe.

I came across this talk by @uschindler. The latest on the work that Uwe presented is here. I am personally interested in seeing how this would work with the effort in Netty 5 to replace the use of the off-heap direct buffer with Panama. I think Uwe's work can be exploited in OpenSearch in all index read situations. But I think it might have a significant impact in shard or segment re-allocation or replication scenarios and for that, I am wondering if it would be possible to bypass the JVM heap entirely and pass references to mmap directly to the network module (Netty) via a foreign method. And if that is in fact doable, as was mentioned in the talk, how would unmap work when a network thread is still reading from a mmapped region.

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

andrross commented 6 months ago

[Triage - attendees 1 2 3] @austintlee Thanks for filing this issue. Looking forward to seeing more prototyping here. I will also note that this will require adopting Netty 5 as described in #2619.

uschindler commented 6 months ago

I don't fully understand the issue. If OpenSearch uses MMapDirectory it automatically uses the new approach. So basically if you run OpenSearch with Lucene 9.5+ (I think its that version) and Java 19+ it will automatically use Panama. Starting with Lucene 9.10 it works with ALL versions Java 19, 20, 21, 22 and later! (final PR: https://github.com/apache/lucene/pull/12706)

If there are other off-heap buffers in OpenSearch those could use a separate MR-JAR in Opensearch or reuse Netty stuff.

Lucene works without enabling the --enable-preview at runtime, because we use some compilation tricks.

P.S.: Lucene 10 (Java 21+) no longer has ByteBufferIndexInput and MMapDirectory is fully backed by Panama (see https://github.com/apache/lucene/pull/13146)