mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration
0 stars 0 forks source link

Allow BKDReader packedIndex to be off heap [LUCENE-8932] #929

Closed mikemccand closed 5 years ago

mikemccand commented 5 years ago

This change modifies BKDReader to read the packedIndex bytes off heap rather than load them all on heap at a single time.

Questions for discussion:

  1. Should BKDReader only support packedIndex off heap?
  2. If not, how should the choice be made?

Using luceneutils IndexAndSearchOpenStreetMaps present the following test results:

with -box -points (patch)

READER MB: 1.1345596313476562

BEST M hits/sec: 73.34277344984474 BEST QPS: 74.63011169783009

with -box -points (original)

READER MB: 1.7249317169189453

BEST M hits/sec: 73.77125157623486 BEST QPS: 75.06611062353801

with -nearest 10 -points (patch)

READER MB: 1.1345596313476562

BEST M hits/sec: 0.013586298373879497 BEST QPS: 1358.6298373879497

with -nearest 10 -points (original)

READER MB: 1.7249317169189453

BEST M hits/sec: 0.01445208197367343 BEST QPS: 1445.208197367343

with -box -geo3d (patch)

READER MB: 1.1345596313476562

BEST M hits/sec: 39.84968715299074 BEST QPS: 40.54914292796736

with -box -geo3d (original)

READER MB: 1.7456226348876953

BEST M hits/sec: 40.45051734329004 BEST QPS: 41.160519101846695


Legacy Jira details

LUCENE-8932 by Jack Conradson (@jdconrad) on Jul 23 2019, resolved Oct 25 2019 Attachments: LUCENE-8932.patch (versions: 3)

mikemccand commented 5 years ago

This is interesting! It doesn't seem to affect performance at all so I believe we could just load the index off-heap all the time? And maybe something like LUCENE-8833 could help make sure it is hot in the page cache.

[Legacy Jira: Adrien Grand (@jpountz) on Jul 24 2019]

mikemccand commented 5 years ago

I was hoping LUCENE-8833 would land and we could build on it but it's taking some more time than I thought yet I don't think we need to wait. I'd suggest the following to move forward:

Then when LUCENE-8833 lands we can look into doing this a bit differently. @jdconrad Does it sound good to you?

The patch looks good to me in general. One minor thing that looks inconsistent to me is that BKDInput#setPosition doesn't throw an IOException. It forces the offheap impl to catch and rethrow, while readBytes throws an IOException so it's up to callers to deal with the exception. In my opinion, we should make setPosition and readBytes consistent: either both throw an IOException, or none of them do?

[Legacy Jira: Adrien Grand (@jpountz) on Sep 05 2019]

mikemccand commented 5 years ago

@jpountz Thanks for the thorough review!  I will incorporate your feedback into a patch next week and try to move forward from there.

[Legacy Jira: Jack Conradson (@jdconrad) on Sep 06 2019]

mikemccand commented 5 years ago

@jpountz Apologies for the delay. I have added a second constructor that takes both the IndexInput and a boolean for on/off-heap. The first constructor will select off-heap iff the IndexInput extends ByteBufferIndexInput. I made all tests uses a random boolean to determine whether to test on-heap or off-heap. Finally, I changed setPosition to throw an IOException as it's the API that's changing directly.

[Legacy Jira: Jack Conradson (@jdconrad) on Oct 15 2019]

mikemccand commented 5 years ago

Thanks Jack, the patch looks good. One minor thing that I'd like to change would be to make BKDOffHeapInput's clone() not re-read the minLeafBlockFP, like you did in BKDOnHeapInput by providing a dedicated constructor for cloning.

[Legacy Jira: Adrien Grand (@jpountz) on Oct 18 2019]

mikemccand commented 5 years ago

@jpountz Thanks for looking again. I have added a second constructor for BKDOffHeapInput to match the clone method in BKDOnHeapInput as requested.

[Legacy Jira: Jack Conradson (@jdconrad) on Oct 24 2019]

mikemccand commented 5 years ago

Commit a939c08dba706c4fe99df5e0621092bd0ec22587 in lucene-solr's branch refs/heads/master from Adrien Grand https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a939c08

LUCENE-8932: Move BKDReader's index off-heap when the input is a ByteBufferIndexInput.

[Legacy Jira: ASF subversion and git services on Oct 25 2019]

mikemccand commented 5 years ago

Commit e648d601efb8f500c10b9524175bbaa85345142d in lucene-solr's branch refs/heads/branch_8x from Adrien Grand https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e648d60

LUCENE-8932: Move BKDReader's index off-heap when the input is a ByteBufferIndexInput.

[Legacy Jira: ASF subversion and git services on Oct 25 2019]

mikemccand commented 5 years ago

Thanks @jdconrad!

[Legacy Jira: Adrien Grand (@jpountz) on Oct 25 2019]

mikemccand commented 4 years ago

Closing after the 8.4.0 release.

[Legacy Jira: Adrien Grand (@jpountz) on Dec 29 2019]