mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration
0 stars 0 forks source link

Make NativeUnixDirectory pure java now that direct IO is possible [LUCENE-8982] #979

Closed mikemccand closed 3 years ago

mikemccand commented 4 years ago

NativeUnixDirectory is a Directory implementation that uses direct IO to write newly merged segments.  Direct IO bypasses the kernel's buffer cache and write cache, making merge writes "invisible" to the kernel, though the reads for merging the N segments are still going through the kernel.

But today, NativeUnixDirectory uses a small JNI wrapper to access the O_DIRECT flag to open ... since JDK9 we can now pass that flag in pure java code, so we should now fix NativeUnixDirectory to not use JNI anymore.

We should also run some more realistic benchmarks seeing if this option really helps nodes that are doing concurrent indexing (merging) and searching.


Legacy Jira details

LUCENE-8982 by Michael McCandless (@mikemccand) on Sep 18 2019, resolved Jan 17 2021 Pull requests: https://github.com/apache/lucene-solr/pull/2068

mikemccand commented 4 years ago

We should rename the directory to something more generic. It's also supported on Windows!

In addition, the hardcoded alignment of 512 should be determined on directory creation (reading it from FileStore.getBlockSize().

Code should also create FileChannel directly not via forbidden Java.io.File classes. That was just a workaround to bring the FileDescriptor into the game.

[Legacy Jira: Uwe Schindler (@uschindler) on Sep 18 2019]

mikemccand commented 4 years ago

If you have a direct buffer, you can now also get an aligned slice.

[Legacy Jira: Uwe Schindler (@uschindler) on Sep 18 2019]

mikemccand commented 4 years ago

And it's Java 10 where this was introduced. So it's fine for master.

[Legacy Jira: Uwe Schindler (@uschindler) on Sep 18 2019]

mikemccand commented 4 years ago

Here is the full JDK patch including examples in test (how to align direct buffers and how to open channels): http://hg.openjdk.java.net/jdk10/master/rev/d72d7d55c765

[Legacy Jira: Uwe Schindler (@uschindler) on Sep 18 2019]

mikemccand commented 4 years ago

We should rename the directory to something more generic. It's also supported on Windows!

+1, DirectIODirectory?

In addition, the hardcoded alignment of 512 should be determined on directory creation (reading it from FileStore.getBlockSize().

+1, I think we can do something similar to IOUtils.spins here ... I have an initial cut at this, I'll make a patch soon.

Code should also create FileChannel directly not via forbidden Java.io.File classes. That was just a workaround to bring the FileDescriptor into the game.

+1

[Legacy Jira: Michael McCandless (@mikemccand) on Oct 14 2019]

mikemccand commented 3 years ago

hi @mikemccand, I’m new to contributing to Lucene and have been searching around for small scope issues to take on. After reading the description and discussion of this issue, this seems to be a straightforward one (not sure about the new benchmarking part though). Is it ok that I take this one and open a PR ?

[Legacy Jira: Zach Chen (@zacharymorn) on Oct 29 2020]

mikemccand commented 3 years ago

Yes please!  Feel free to tackle this!  I can help w/ benchmarking.

[Legacy Jira: Michael McCandless (@mikemccand) on Oct 30 2020]

mikemccand commented 3 years ago

Thanks Michael! I've opened an initial PR with the proposed changes, but also have a few follow-up questions:

  1. Are new unit tests (other than benchmarking tests) needed, given these changes may not have direct side-effects to assert on?
  2. Shall I go ahead and remove code that may no longer be used after these changes, such as the related method in NativePosixUtil.cpp?

The PR also needs to include some more updates to comment to reflect the new way of doing direct IO. I can work on that once the changes are very much ready. 

 

 

[Legacy Jira: Zach Chen (@zacharymorn) on Oct 31 2020]

mikemccand commented 3 years ago

Sort of related - the class comments currently say to do ant build-native-unix which is no longer accurate. Has this been ported to Gradle? How do I test this?

[Legacy Jira: Mike Drob (@madrob) on Nov 02 2020]

mikemccand commented 3 years ago

@mdrob  It does seems ant build-native-unix is no longer available, and running ./gradlew helpAnt doesn't help either. I'm not super familiar with cpp development, but to facilitate this I've done some research and after some trial and errors, come up with a separate PR to build the cpp code (as decoupling the two changes might help benchmarking) https://github.com/apache/lucene-solr/pull/2068 . Could you please take a look and let me know if this is a good direction to take?

Please note that in the PR, I created a new native module to host NativefPosixUtil.cpp, as there's compatibility issue between cpp-library and java-library Gradle plugins. For details, please see https://github.com/gradle/gradle-native/issues/352#issuecomment-461724948. 

[Legacy Jira: Zach Chen (@zacharymorn) on Nov 07 2020]

mikemccand commented 3 years ago

Hi Zach. I don't think we've ported this particular build fragment from ant - thank you for contributing it. It can be a separate module but I think it shouldn't be forced upon everyone on default convention tasks (assemble, check) - I didn't check if it's currently the case. Native builds require extra setup and tools that some platforms may lack (Windows, Mac). If it's the case that these tasks run by default is then we'll need to add either an explicit property (-Pbuild.native=true?) or a dedicated task (native) and make the build conditional on whether it is explicitly defined.

I can take a look if you have a problem with this, let me know.

[Legacy Jira: Dawid Weiss (@dweiss) on Nov 07 2020]

mikemccand commented 3 years ago

The new task appeared to build fine on my Mac, although I didn't test the resulting lib.

[Legacy Jira: Mike Drob (@madrob) on Nov 08 2020]

mikemccand commented 3 years ago

@dweiss  Thanks for the suggestion and it makes sense! I've pushed an update to condition the native build / include on "-Pbuild.native=true" , could you please take a look and and let me know if it looks good?

 

 

[Legacy Jira: Zach Chen (@zacharymorn) on Nov 09 2020]

mikemccand commented 3 years ago

Commit ebc87a8a27f3b3bd89ea7c38c8b701d94e50788d in lucene-solr's branch refs/heads/master from zacharymorn https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ebc87a8

LUCENE-8982: Separate out native code to another module to allow cpp build with gradle (#2068)

[Legacy Jira: ASF subversion and git services on Nov 16 2020]

mikemccand commented 3 years ago

Commit ebc87a8a27f3b3bd89ea7c38c8b701d94e50788d in lucene-solr's branch refs/heads/master from zacharymorn https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ebc87a8

LUCENE-8982: Separate out native code to another module to allow cpp build with gradle (#2068)

[Legacy Jira: ASF subversion and git services on Nov 16 2020]

mikemccand commented 3 years ago

I've committed in the extracted native library part. I still think it'd be good to see if we can get the same performance with just plain java (and new direct flags).

[Legacy Jira: Dawid Weiss (@dweiss) on Nov 16 2020]

mikemccand commented 3 years ago

Yeah... like I suspected - CI does complain about missing toolchains. https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/28653/consoleText

* What went wrong:
Execution failed for task ':lucene:misc:native:compileDebugLinuxCpp'.
> No tool chain is available to build C++ for host operating system 'Linux' architecture 'x86-64':
    - Tool chain 'visualCpp' (Visual Studio):
        - Visual Studio is not available on this operating system.
    - Tool chain 'gcc' (GNU GCC):
        - Could not find C++ compiler 'g++' in system path.
    - Tool chain 'clang' (Clang):
        - Could not find C++ compiler 'clang++' in system path.

I think we'll switch native builds to be optionally enabled (rather than disabled)?

[Legacy Jira: Dawid Weiss (@dweiss) on Nov 16 2020]

mikemccand commented 3 years ago

Commit fd3ffd0d38aaeaa8629943f69dca2ff04afcfbfa in lucene-solr's branch refs/heads/master from Dawid Weiss https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fd3ffd0

LUCENE-8982: make native builds disabled by default (CI complains).

[Legacy Jira: ASF subversion and git services on Nov 16 2020]

mikemccand commented 3 years ago

Hi, would it be possible to build the stuff, if a toolchain is there? It looks like Gradle looks for all different types of tool chains and then gives up. My idea: If it finds windows, it builds windows. if it finds gcc it builds linux, macos same.

My biggest problem is still: How to handle releases of binary artifacts? Do we really want to make this dependent of the release manager's local system?

[Legacy Jira: Uwe Schindler (@uschindler) on Nov 16 2020]

mikemccand commented 3 years ago

gradle will pick up the toolchain suitable for the platform if it finds any - the CI just doesn't have it.

My biggest problem is still: How to handle releases of binary artifacts? Do we really want to make this dependent of the release manager's local system?

This is what I actually raised in the issue's comment too. I don't think we should include binaries in releases. We should strive to make it java-only (and if somebody really wants to, they can compile native modules locally, for a given platform).

[Legacy Jira: Dawid Weiss (@dweiss) on Nov 16 2020]

mikemccand commented 3 years ago

gradle will pick up the toolchain suitable for the platform if it finds any - the CI just doesn't have it.

That's fine. My idea was: Can't we build the native dependencies, if the toolchain is there, but don't fail if not an just print a warning?

[Legacy Jira: Uwe Schindler (@uschindler) on Nov 16 2020]

mikemccand commented 3 years ago

In short: change the default to "true" if toolchain is there.

[Legacy Jira: Uwe Schindler (@uschindler) on Nov 16 2020]

mikemccand commented 3 years ago

You can set -Pbuild.native=true manually on those VMs you know the tools are available... I don't think duplicating whatever logic gradle uses to detect those toolschains automatically is worth the effort, to be honest. The logic is probably in gradle's sources somewhere. I don't know if it can be done easier than by copying from their code.

[Legacy Jira: Dawid Weiss (@dweiss) on Nov 16 2020]

mikemccand commented 3 years ago

Sorry for the late response, just got off work and saw this. 

From the discussion it seems the assumption / reality here is that cpp toolchain may or may not be available in the VMs. However, since Lucene does have native code and scheduled build can discover any change that breaks the native-java integration early on (there was actually one commit before this that broke it), should the CI builds in general require cpp toolchain to be there in the VMs (and add them if they are missing) to execute the compilation and tests, but in gradle still have -Pbuild.native=false as default to not break builds for others in development, and have a few CI VMs with cpp toolchain intentionally left out to test for compatibility?

[Legacy Jira: Zach Chen (@zacharymorn) on Nov 17 2020]

mikemccand commented 3 years ago

I am now merging the PR. Thanks Zach!

[Legacy Jira: Uwe Schindler (@uschindler) on Jan 17 2021]

mikemccand commented 3 years ago

Commit a7747b63b4ac649837469ee189b8aef10e4ce867 in lucene-solr's branch refs/heads/master from zacharymorn https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a7747b6

LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory (#2052)

LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

[Legacy Jira: ASF subversion and git services on Jan 17 2021]

mikemccand commented 3 years ago

Commit a7747b63b4ac649837469ee189b8aef10e4ce867 in lucene-solr's branch refs/heads/master from zacharymorn https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a7747b6

LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory (#2052)

LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

[Legacy Jira: ASF subversion and git services on Jan 17 2021]

mikemccand commented 3 years ago

Thanks Zach!

[Legacy Jira: Uwe Schindler (@uschindler) on Jan 17 2021]

mikemccand commented 3 years ago

Commit 4b508aef2469a7d5083a226a61e8977bb67174e7 in lucene-solr's branch refs/heads/master from Uwe Schindler https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4b508ae

LUCENE-8982: Add a note to MIGRATE.md

[Legacy Jira: ASF subversion and git services on Jan 17 2021]

mikemccand commented 3 years ago

No problem! Thanks Uwe,  Michael and Dawid for all the help and guidance there!

[Legacy Jira: Zach Chen (@zacharymorn) on Jan 18 2021]

mikemccand commented 2 years ago

Closing after the 9.0.0 release

[Legacy Jira: Adrien Grand (@jpountz) on Dec 08 2021]