Lazy loading Lucene FST offheap using mmap [LUCENE-8635]

mikemccand commented 5 years ago

Currently, FST loads all the terms into heap memory during index open. This causes frequent JVM OOM issues if the term size gets big. A better way of doing this will be to lazily load FST using mmap. That ensures only the required terms get loaded into memory.

Lucene can expose API for providing list of fields to load terms offheap. I'm planning to take following approach for this:

Add a boolean property fstOffHeap in FieldInfo
Pass list of offheap fields to lucene during index open (ALL can be special keyword for loading ALL fields offheap)
Initialize the fstOffHeap property during lucene index open
FieldReader invokes default FST constructor or OffHeap constructor based on fstOffHeap field

I created a patch (that loads all fields offheap), did some benchmarks using es_rally and results look good.

Legacy Jira details

LUCENE-8635 by Ankit Jain on Jan 11 2019, resolved Feb 19 2019 Environment:

I used below setup for es_rally tests:

single node i3.xlarge running ES 6.5

es_rally was running on another i3.xlarge instance

Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch, offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx Linked issues:

LUCENE-7826
- LUCENE-8671
- LUCENE-8887

mikemccand commented 5 years ago

Wow, this is impressive! Surprising how small the change was – basically opening up the FST BytesStore API a bit so that we could have an impl that wraps an IndexInput (reading backwards) instead of a byte[].

Can you copy/paste the rally results out of Excel here? I'm curious what search-time impact you're seeing. If it not too much of an impact maybe we should consider just moving FSTs off-heap in the default codec? We've done similar things recently for Lucene ... e.g. moving norms off heap.

I'll run Lucene's wikipedia benchmarks to measure the impact from our standard benchmarks (the nightly Lucene benchmarks).

[Legacy Jira: Michael McCandless (@mikemccand) on Jan 11 2019]

mikemccand commented 5 years ago

Also, have you confirmed that all tests pass when you switch to off heap FST storage always?

[Legacy Jira: Michael McCandless (@mikemccand) on Jan 11 2019]

mikemccand commented 5 years ago

The excel sheet is big, so pasting here might not help? You have good point about moving FSTs off-heap in the default codec as we can always preload mmap file during index open as demonstrated here

I ran the default lucene test suite and couple of tests seem to fail. Though, they don't seem to have anything to do with my change:

[junit4] Tests with failures [seed: 1D3ADDF6AE377902]:

[junit4] - org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup

[junit4] - org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.testTrigger

[junit4] Execution time total: 1 hour 12 minutes 40 seconds

[junit4] Tests summary: 833 suites (7 ignored), 4024 tests, 2 failures, 286 ignored (153 assumptions)

UPDATE: The tests passed after retrying individually.

[Legacy Jira: Ankit Jain on Jan 11 2019 [updated: Jan 12 2019]]

mikemccand commented 5 years ago

Ankit:

The autoscaling tests are have been failing intermittently for a while. If you can run those tests independently and have them succeed I wouldn't worry about them.

"run those tests independently" in this case is just executing the "reproduce with" line, just cut/paste. e.g.

ant test -Dtestcase=ScheduledMaintenanceTriggerTest -Dtests.method=testInactiveShardCleanup -Dtests.seed=1D3ADDF6AE377902 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=ha -Dtests.timezone=America/Nome -Dtests.asserts=true -Dtests.file.encoding=US-ASCII

Best,

Erick

[Legacy Jira: Erick Erickson (@ErickErickson) on Jan 12 2019]

mikemccand commented 5 years ago

Thanks for the tip Erick. I ran the failing tests individually and they passed!

[Legacy Jira: Ankit Jain on Jan 12 2019]

mikemccand commented 5 years ago

This looked interesting to me, too, so I did run the becnhmarks with the change, but sadly the results were not great, which is surprising given the Rally test results, which looked positive I think? I'm not really sure how to interpret Rally output since I'm not familiar wit hthat tool. Does it test query performance? Maybe there is a use case for this that is different than what is being tested by the benchmarks; here is what I saw after a benchmark run. This run is maybe a little unusual since I have some mods to the benchmark (running w/8 threads executor service, enabled indexSort, topN=500 b/c of some other tests I was running. I can re-run with more "normal" settings, but this already looks kind of suspect.

                    Task  QPS before      StdDev   QPS after      StdDev                Pct diff
                PKLookup      163.94      (2.3%)      123.50      (2.0%)  -24.7% ( -28% -  -20%)
              AndHighLow     5096.79      (1.2%)     4860.87      (1.5%)   -4.6% (  -7% -   -2%)
                  Fuzzy1      711.37      (2.3%)      681.03      (2.4%)   -4.3% (  -8% -    0%)
                  Fuzzy2      203.67      (2.6%)      196.77      (2.6%)   -3.4% (  -8% -    1%)
              AndHighMed     3460.06      (2.7%)     3346.84      (3.2%)   -3.3% (  -8% -    2%)
               LowPhrase     3448.68      (2.8%)     3345.41      (2.7%)   -3.0% (  -8% -    2%)
         LowSloppyPhrase     3278.72      (2.9%)     3184.03      (2.8%)   -2.9% (  -8% -    2%)
             LowSpanNear     3123.68      (2.9%)     3040.74      (2.6%)   -2.7% (  -7% -    2%)
                 Respell      716.61      (1.7%)      699.22      (1.8%)   -2.4% (  -5% -    1%)
               MedPhrase     2970.83      (3.2%)     2899.18      (3.0%)   -2.4% (  -8% -    3%)
             AndHighHigh     2626.26      (3.7%)     2563.37      (4.0%)   -2.4% (  -9% -    5%)
         MedSloppyPhrase     2642.66      (3.6%)     2582.02      (3.3%)   -2.3% (  -8% -    4%)
             MedSpanNear     2598.01      (3.5%)     2541.03      (3.2%)   -2.2% (  -8% -    4%)
    BrowseDateTaxoFacets     3467.39      (2.7%)     3399.62      (3.3%)   -2.0% (  -7% -    4%)
                 LowTerm     3896.13      (4.7%)     3824.62      (4.4%)   -1.8% ( -10% -    7%)
            HighSpanNear     1511.97      (4.7%)     1484.42      (4.6%)   -1.8% ( -10% -    7%)
               OrHighMed     1406.84      (5.7%)     1382.52      (5.8%)   -1.7% ( -12% -   10%)
               OrHighLow     1484.58      (6.1%)     1460.06      (6.0%)   -1.7% ( -12% -   11%)
              HighPhrase     1740.06      (4.5%)     1712.12      (4.4%)   -1.6% ( -10% -    7%)
        HighSloppyPhrase     1547.60      (4.7%)     1523.48      (4.6%)   -1.6% ( -10% -    8%)
   BrowseMonthTaxoFacets     9031.31      (2.1%)     8897.26      (2.6%)   -1.5% (  -6% -    3%)
              OrHighHigh     1111.59      (6.3%)     1095.29      (6.5%)   -1.5% ( -13% -   12%)
   HighTermDayOfYearSort     2197.07      (5.9%)     2166.89      (3.9%)   -1.4% ( -10% -    8%)
                 MedTerm     2621.21      (5.3%)     2586.41      (5.0%)   -1.3% ( -11% -    9%)
BrowseDayOfYearTaxoFacets     9011.41      (1.6%)     8907.44      (1.5%)   -1.2% (  -4% -    1%)
       HighTermMonthSort     2449.33      (5.5%)     2421.11      (4.4%)   -1.2% ( -10% -    9%)
                HighTerm     1629.92      (6.5%)     1612.72      (6.4%)   -1.1% ( -13% -   12%)
                  IntNRQ      980.43      (9.1%)      973.72      (8.9%)   -0.7% ( -17% -   19%)
                Wildcard     1779.82      (5.7%)     1771.12      (5.5%)   -0.5% ( -11% -   11%)
                 Prefix3     1790.47      (5.9%)     1781.85      (5.8%)   -0.5% ( -11% -   11%)
BrowseDayOfYearSSDVFacets     2038.63      (3.0%)     2032.32      (2.1%)   -0.3% (  -5% -    4%)
   BrowseMonthSSDVFacets     2295.02      (2.5%)     2303.01      (1.9%)    0.3% (  -4% -    4%)

[Legacy Jira: Michael Sokolov (@msokolov) on Jan 15 2019]

mikemccand commented 5 years ago

Thanks for testing @msokolov – the results make sense: the most terms dictionary intensive queries are impacted the most, with PKLookup being heavily impacted since that's just purely exercising the terms dictionary with no postings visited. Fuzzy queries, and then queries matching few hits (conjunctions with low/medium freq terms) also spend relatively more time in the terms dictionary ...

So net/net it looks like we should not make this the default, but expose it somehow as an option for those use cases that don't want to dedicate heap memory to storing FSTs?

[Legacy Jira: Michael McCandless (@mikemccand) on Jan 15 2019]

mikemccand commented 5 years ago

+1 looks valuable, especially for cases where you don't necessarily want to always bring the FSTs into memory since it's inherently lazy-load.

The PK lookup doesn't concern me much since such queries would usually already be fast and overall a tiny fraction of a search platform in typical usage.

Ideally this setting could be toggled on a per-field basis.

[Legacy Jira: David Smiley (@dsmiley) on Jan 15 2019]

mikemccand commented 5 years ago

Rally tests use underlying elasticsearch cluster which use cases other than search like log analytics. I ran 1 iteration for multiple data sets and did not notice significant performance degradations. Rather, I noticed 6% improvement in indexing throughput for all the data sets. Though, I should leave it running for more iterations, to get more conclusive evidence.

Thanks @msokolov for testing the changes. I think the impact is as expected, maybe slightly more for the PKLookup. Do the tests use randomized key for each PKLookup query or the keys are reused across queries? That will impact the overall throughput as mmap is inherently lazily loaded.

Though, I'm open to exposing per field setting in Lucene, I agree with @dsmiley about 25% reduction in throughput being tiny fraction of typical usage. And, throughput should be better if same keys get used for PKLookup queries. Adding per field setting might require code change and will be effective only for data indexed using new codec. My knowledge of Lucene settings is limited and I might be wrong.

[Legacy Jira: Ankit Jain on Jan 16 2019]

mikemccand commented 5 years ago

This is pretty cool. I'm happily surprised as well of how small the patch is.

Do the tests use randomized key for each PKLookup query or the keys are reused across queries?

It uses random keys: https://github.com/mikemccand/luceneutil/blob/7d3ee97a4349c300d399fd83fb11febdf4607f44/src/main/perf/PKLookupTask.java

Adding per field setting might require code change and will be effective only for data indexed using new codec.

Technically we could make things work for existing segments since your patch doesn't change the file format.

In general I'm supportive of moving as much as we can to disk and relying on the OS cache to load important stuff in memory and keep the rest on disk. The thing that makes me want to be careful here is that access to the terms index is very random, so things might degrade badly if the OS cache doesn't hold the whole terms index in memory. I'm not super familiar with the FST internals, I wonder whether there are changes that we could make to it so that it would be more disk-friendly, eg. by seeking backward as little as possible when looking up a key?

[Legacy Jira: Adrien Grand (@jpountz) on Jan 16 2019]

mikemccand commented 5 years ago

Following a suggestion from @mikemccand I tried a slightly different version of this, making use of randomAccessSlice to avoid some calls to seek(), and this gives better perf in the benchmarks. I also spent some time trying to understand FST's backwards-seeking behavior. Based on my crude understanding, and comment from Mike again, it seems as if with some work it would be possible to make it more naturally forward-seeking, but it's not obvious that in general you would get more local cache-friendly access patterns from that. Still you might; probably needs some experimentation to know for sure. Here are the benchmark #s from the random-access patch:

                    Task  QPS before      StdDev   QPS after      StdDev                Pct diff
                PKLookup      133.62      (2.2%)      123.74      (1.5%)   -7.4% ( -10% -   -3%)
              AndHighLow     3411.49      (3.2%)     3268.04      (3.1%)   -4.2% ( -10% -    2%)
BrowseDayOfYearTaxoFacets    10067.18      (4.3%)     9828.65      (3.5%)   -2.4% (  -9% -    5%)
                 LowTerm     3567.48      (1.2%)     3489.27      (1.7%)   -2.2% (  -5% -    0%)
                  Fuzzy1      147.67      (3.1%)      144.65      (2.4%)   -2.0% (  -7% -    3%)
   BrowseMonthTaxoFacets    10102.27      (4.2%)     9901.49      (4.1%)   -2.0% (  -9% -    6%)
                  Fuzzy2       62.00      (2.8%)       60.87      (2.4%)   -1.8% (  -6% -    3%)
                 MedTerm     2694.87      (2.0%)     2647.08      (2.1%)   -1.8% (  -5% -    2%)
              AndHighMed     1171.52      (2.7%)     1154.25      (2.8%)   -1.5% (  -6% -    4%)
                HighTerm     2061.53      (2.3%)     2032.84      (2.5%)   -1.4% (  -6% -    3%)
         MedSloppyPhrase      266.60      (3.4%)      263.01      (4.2%)   -1.3% (  -8% -    6%)
              OrHighHigh      278.90      (4.0%)      275.35      (4.7%)   -1.3% (  -9% -    7%)
        HighSloppyPhrase      107.68      (5.5%)      106.34      (5.6%)   -1.2% ( -11% -   10%)
                 Respell      118.26      (2.1%)      116.95      (2.2%)   -1.1% (  -5% -    3%)
             AndHighHigh      472.93      (4.4%)      467.78      (3.3%)   -1.1% (  -8% -    6%)
               OrHighMed      755.21      (2.9%)      748.34      (3.3%)   -0.9% (  -6% -    5%)
             MedSpanNear      308.31      (3.3%)      305.59      (3.8%)   -0.9% (  -7% -    6%)
                Wildcard      869.37      (3.5%)      862.74      (1.9%)   -0.8% (  -5% -    4%)
       HighTermMonthSort      871.33      (7.1%)      865.80      (6.1%)   -0.6% ( -12% -   13%)
               MedPhrase      449.39      (3.0%)      446.55      (2.4%)   -0.6% (  -5% -    4%)
             LowSpanNear      391.10      (3.3%)      388.77      (3.8%)   -0.6% (  -7% -    6%)
         LowSloppyPhrase      406.57      (3.8%)      404.23      (3.6%)   -0.6% (  -7% -    7%)
              HighPhrase      239.84      (3.7%)      238.78      (3.3%)   -0.4% (  -7% -    6%)
                 Prefix3     1230.56      (5.0%)     1225.52      (2.9%)   -0.4% (  -7% -    7%)
            HighSpanNear      107.34      (5.2%)      107.20      (5.3%)   -0.1% ( -10% -   10%)
               LowPhrase      438.52      (3.4%)      438.14      (2.5%)   -0.1% (  -5% -    5%)
    BrowseDateTaxoFacets       11.14      (4.0%)       11.16      (7.0%)    0.2% ( -10% -   11%)
   HighTermDayOfYearSort      606.85      (6.7%)      608.65      (5.4%)    0.3% ( -11% -   13%)
                  IntNRQ      987.08     (12.5%)      990.96     (13.5%)    0.4% ( -22% -   30%)
               OrHighLow      553.72      (3.2%)      558.09      (3.5%)    0.8% (  -5% -    7%)
BrowseDayOfYearSSDVFacets       38.23      (3.9%)       38.66      (4.1%)    1.1% (  -6% -    9%)
   BrowseMonthSSDVFacets       42.05      (3.5%)       42.57      (3.7%)    1.2% (  -5% -    8%)

[Legacy Jira: Michael Sokolov (@msokolov) on Jan 16 2019]

mikemccand commented 5 years ago

Thanks @msokolov for updating patch and doing another run. As per my understanding, seek operation has very less overhead (should be in micro seconds), as it just sets the buffer to right position? Maybe the number of seek operations is huge and they add up.

[Legacy Jira: Ankit Jain on Jan 16 2019]

mikemccand commented 5 years ago

Right, it seems crazy that makes a difference. I guess there is a tiny bit less arithmetic in the RandomAccess version as well. I guess there can be a lot of small reads of the terms dictionary

[Legacy Jira: Michael Sokolov (@msokolov) on Jan 16 2019]

mikemccand commented 5 years ago

Thanks @msokolov – those numbers look quite a bit better! Though, your QPSs are kinda high overall – how many Wikipedia docs were in your index?

I do wonder if we simply reversed the FST's byte[] when we create it, what impact that'd have on lookup performance. Hmm even if we did that, we'd still have to readBytes one byte at a time since RandomAccessInput does not have a readBytes method? But ... maybe IndexInput would give good performance in that case? We should probably pursue that separately though...

[Legacy Jira: Michael McCandless (@mikemccand) on Jan 16 2019]

mikemccand commented 5 years ago

I used the wikimedia2m data set for the second set of tests (the first test was on a tiny index - 10k docs) – at least I think I did! I am kind of new to the benchmarking game. I ran the becnhmarks with python src/python/localrun.py -source wikimedium2m, and I can see that the index dir is 861M.

[Legacy Jira: Michael Sokolov (@msokolov) on Jan 17 2019]

mikemccand commented 5 years ago

OK thanks @sokolov. I'll try to also run bench on wikibig and report back. I think doing a single method call instead of the two (seek + read) via RandomAccessInput must be helping.

The thing that makes me want to be careful here is that access to the terms index is very random, so things might degrade badly if the OS cache doesn't hold the whole terms index in memory.

I think net/net we are already relying on OS to do the right thing here. As things stand today, the OS could also swap out the heap pages that hold the FST's byte[] depending on its swappiness (on Linux).

I'm not super familiar with the FST internals, I wonder whether there are changes that we could make to it so that it would be more disk-friendly, eg. by seeking backward as little as possible when looking up a key?

We used to have a ``pack method in FST that would 1) try to further compress the byte[] size by moving nodes "closer" to the nodes that transitioned to them, and 2) reversing the bytes. But we removed that method because it added complexity and nobody was really using it and sometimes it even made the FST bigger!

Maybe, we could bring the method back, but only part 2) of it, and always call it at the end of building an FST? That should be simpler code (without part 1), and should achieve sequential reads of at least the bytes to decode a single transition; maybe it gives a performance jump independent of this change? But I think we really should explore that independently of this issue ... I think as long as additional performance tests show only these smallish impacts to real queries we should just make the change across the board for terms dictionary index?

[Legacy Jira: Michael McCandless (@mikemccand) on Jan 17 2019]

mikemccand commented 5 years ago

The PK lookup doesn't concern me much since such queries would usually already be fast and overall a tiny fraction of a search platform in typical usage.

For the record, Lucene also performs implicit PK lookups when indexing with updateDocument. So this might have an impact on indexing speed as well.

I think net/net we are already relying on OS to do the right thing here. As things stand today, the OS could also swap out the heap pages that hold the FST's byte[] depending on its swappiness

Most deployments I am aware of tune swappiness to avoid this situation. :)

Don't get me wrong, I'm very much in favor of this change. I agree it's a bit unlikely that the terms index gets paged out, but you can still end up with a cold FS cache eg. when the host restarts?

Furthermore the NIO and Simple FS directories use buffering. I'm wondering how bad things would be if every seek would need to reload the buffer? You mentioned bringing back pack() with 2) only, maybe reordering nodes would still be useful so that we could optimize the likeliness that two connected nodes of the FST would be in the same buffer (or maybe the current way of building FSTs is already good from that perspective?)? Even if that made the FST a bit larger that would still probably be a good trade-off now that we are considering keeping the FST on disk?

[Legacy Jira: Adrien Grand (@jpountz) on Jan 18 2019]

mikemccand commented 5 years ago

you can still end up with a cold FS cache eg. when the host restarts?

For the cold host case, we already have to take measures to warm our service even when we hold the entire index in RAM; not just paging in index files, but also JVM hotspot compilation and other non-Lucene service startup costs. I feel like this is just part of starting a service, although in this case previously Lucene would "warm" the FSTs for you by preloading them into heap memory? WIth this change, that preloading would have to rely on running warming queries or somehow "touching" the terms.

[ sorry for the duplicate posts – I need to get in the habit of using Jira instead of email! ]

[Legacy Jira: Michael Sokolov (@msokolov) on Jan 18 2019]

mikemccand commented 5 years ago

For the cold host case, we already have to take measures to warm our service even when we hold the entire index in RAM; not just paging in index files, but also JVM hotspot compilation and other non-Lucene service startup costs. I feel like this is just part of starting a service, although in this case previously Lucene would "warm" the FSTs for you by preloading them into heap memory? WIth this change, that preloading would have to rely on running warming queries or somehow "touching" the terms.

[Legacy Jira: Mike Sokolov on Jan 18 2019]

mikemccand commented 5 years ago

I opened LUCENE-8653 to explore reversing FSTs; if we can do that, it should simplify the reader we use here and maybe help performance

[Legacy Jira: Michael Sokolov (@msokolov) on Jan 21 2019]

mikemccand commented 5 years ago

Wondering whether avoiding 'array reversal' in the second patch is what helped rather than moving to random access and removing skip? May be we should try with reading one byte at a time with original patch. I feel the reversal while storing and then reading bytes as suggested by @mikemccand would definitely help.

[Legacy Jira: Murali Krishna P on Jan 22 2019]

mikemccand commented 5 years ago

I uploaded a patch that combines these three things: off-heap FST + random-access reader + reversal of the FST so it is forward-read. Unit tests are passing; I'm running some benchmarks to see what the impact is on performance

[Legacy Jira: Michael Sokolov (@msokolov) on Jan 22 2019]

mikemccand commented 5 years ago

Technically we could make things work for existing segments since your patch doesn't change the file format.

@jpountz - I'm curious on how this can be done. I looked at the code and it seemed that all settings are passed to the segment writer and writer should put those settings in codec for reader to consume. Do you have any pointers on this?

I agree it's a bit unlikely that the terms index gets paged out, but you can still end up with a cold FS cache eg. when the host restarts?

There can be option for preloading terms index during index open. Even though, lucene already provides option for preloading mapped buffer here, it is done at directory level and not file level. Though, elasticsearch worked around that to provide file level setting

For the record, Lucene also performs implicit PK lookups when indexing with updateDocument. So this might have an impact on indexing speed as well.

If customer workload is updateDocument heavy, the impact should be minimal, as terms index will get loaded into memory after first fault for every page and then there should not be any page faults. If customers are sensitive to latency, they can use the preload option for terms index.

Wondering whether avoiding 'array reversal' in the second patch is what helped rather than moving to random access and removing skip? May be we should try with reading one byte at a time with original patch.

I overlooked that earlier and attributed performance gain to absence of seek operation. This makes lot more sense, will try to do some by changing readBytes to below:


    public byte readByte() throws IOException {
        final byte b = this.in.readByte();
        this.skipBytes(2);
        return b;
    }

    public void readBytes(byte[] b, int offset, int len) throws IOException {
        for (int i=offset+len-1; i>=offset; i--) {
            b[i] = this.readByte();
        }
    }

I uploaded a patch that combines these three things: off-heap FST + random-access reader + reversal of the FST so it is forward-read. Unit tests are passing; I'm running some benchmarks to see what the impact is on performance

That's great Mike. If this works, we don't need the reverse reader. We don't even need the random-access reader, as we can simply change readBytes to below:


    public void readBytes(byte[] b, int offset, int len) throws IOException {
        this.in.readBytes(b, offset, len);
    }

[Legacy Jira: Ankit Jain on Jan 22 2019]

mikemccand commented 5 years ago

we can simply change readBytes to below:

@akjain unfortunately RandomAccessInput doesn't offer readBytes. I'm looking into adding it; shouldn't be hard as there aren't that many implementations.

[Legacy Jira: Michael Sokolov (@msokolov) on Jan 23 2019]

mikemccand commented 5 years ago

Ankit Jain unfortunately RandomAccessInput doesn't offer readBytes. I'm looking into adding it; shouldn't be hard as there aren't that many implementations.

You don't need to use RandomAccessInput. You can revert back to original IndexInputReader and get rid of the reversal logic.

/** Implements forward read for FST from an index input. */
final class ForwardIndexInputReader extends FST.BytesReader {
    private final IndexInput in;
    private final long startFP;

    public ReverseIndexInputReader(IndexInput in, long startFP) {
        this.in = in;
        this.startFP = startFP;
    }

    `@Override`
    public byte readByte() throws IOException {
        return this.in.readByte();
    }

    `@Override`
    public void readBytes(byte[] b, int offset, int len) throws IOException {
        this.in.readBytes(b, offset, len);
    }

    `@Override`
    public void skipBytes(long count) {
        this.setPosition(this.getPosition() + count);
    }

    `@Override`
    public long getPosition() {
        final long position = this.in.getFilePointer() - startFP;
        return position;
    }

    `@Override`
    public void setPosition(long pos) {
        try {
            this.in.seek(startFP + pos);
        } catch (IOException ex) {
            System.out.println(String.format("Unreported exception in set position at %d - %s", pos, ex.getMessage()));
        }
    }

    `@Override`
    public boolean reversed() {
        return false;
    }
}

Furthermore the NIO and Simple FS directories use buffering. I'm wondering how bad things would be if every seek would need to reload the buffer?

This can be serious concern for NIO and Simple FS systems. Given that most of the systems today use mmap, can we limit the offheap FST to mmap supported systems i.e.

Constants.JRE_IS_64BIT && MMapDirectory.UNMAP_SUPPORTED

[Legacy Jira: Ankit Jain on Jan 23 2019]

mikemccand commented 5 years ago

I tried that @akjain and stumbled into a trap that got a big drop in performance! I just used a wrapper around IndexInput rather than the random access approach (using randomAccessSlice) and implemented skipBytes in the obvious way: by calling the delegate's skipBytes. But this is bad. The default implementation of that method comes from DataInput and that actually reads bytes into a buffer rather than simply updating a pointer. I'm not sure I understand the rationale for that - it seems to have to do with checksumming? Possibly ByteBuffer(s)IndexInput could (should?) implement this more efficiently, or maybe it's required to do this reading – not sure. At any rate I think in this case we really just want to seek the pointer, so we can have our FST.BytesReader.skipBytes call IndexInput.seek instead of IndexInput.skipBytes.

[Legacy Jira: Michael Sokolov (@msokolov) on Jan 27 2019]

mikemccand commented 5 years ago

I also independently tried performance run after removing the array reversal in readBytes in original patch, but results looked similar to earlier results.

Since, we are leaning towards keep this as optional, I created another patch - optional_offheap_ra.patch based off reverse random access reader - ra.patch, that adds FST.offheap as system property to allow toggling between offheap and onheap.

The results for wikimedium10k with:

java ...... -DFST.offheap=true

                   TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff
                PKLookup      172.88      (3.3%)      153.94      (3.7%)  -11.0% ( -17% -   -4%)
                 LowTerm    12229.10      (3.5%)    11032.10      (3.3%)   -9.8% ( -16% -   -3%)
              AndHighLow     4679.22      (3.2%)     4349.12      (3.3%)   -7.1% ( -13% -    0%)
                 MedTerm    10179.43      (5.4%)     9533.14      (3.4%)   -6.3% ( -14% -    2%)
                HighTerm     5123.89      (3.1%)     4814.09      (4.7%)   -6.0% ( -13% -    1%)
               LowPhrase     3459.57      (5.3%)     3253.20      (7.5%)   -6.0% ( -17% -    7%)
               MedPhrase     2815.82      (5.1%)     2654.13      (5.6%)   -5.7% ( -15% -    5%)
             MedSpanNear     2196.98      (4.4%)     2082.39      (3.9%)   -5.2% ( -12% -    3%)
        HighSloppyPhrase     1680.32      (5.7%)     1592.91      (8.0%)   -5.2% ( -17% -    9%)
         LowSloppyPhrase     3205.99      (4.9%)     3045.94      (4.4%)   -5.0% ( -13% -    4%)
               OrHighMed     1960.52      (4.8%)     1866.03      (6.2%)   -4.8% ( -15% -    6%)
                Wildcard     1388.45      (8.5%)     1324.82      (6.2%)   -4.6% ( -17% -   11%)
              OrHighHigh     1304.03      (7.8%)     1247.72      (5.1%)   -4.3% ( -16% -    9%)
              AndHighMed     2268.22      (2.8%)     2171.27      (2.8%)   -4.3% (  -9% -    1%)
         MedSloppyPhrase     2697.01      (6.1%)     2597.71      (5.0%)   -3.7% ( -13% -    7%)
   HighTermDayOfYearSort     1719.25      (5.3%)     1657.10      (5.8%)   -3.6% ( -13% -    7%)
            HighSpanNear     1624.69      (4.4%)     1567.35      (5.6%)   -3.5% ( -12% -    6%)
             AndHighHigh     1645.28      (3.7%)     1589.76      (2.9%)   -3.4% (  -9% -    3%)
             LowSpanNear     2319.98      (6.0%)     2246.30      (5.5%)   -3.2% ( -13% -    8%)
               OrHighLow     2264.00      (6.0%)     2200.33      (4.3%)   -2.8% ( -12% -    7%)
       HighTermMonthSort     4829.60      (3.9%)     4700.35      (2.5%)   -2.7% (  -8% -    3%)
                  Fuzzy2      172.46      (4.8%)      168.02      (5.4%)   -2.6% ( -12% -    8%)
              HighPhrase     2525.60      (6.3%)     2464.09      (5.3%)   -2.4% ( -13% -    9%)
                  Fuzzy1      585.39      (4.4%)      571.20      (4.1%)   -2.4% ( -10% -    6%)
                 Prefix3     1359.75      (8.2%)     1330.98      (5.8%)   -2.1% ( -14% -   12%)
                 Respell      501.29      (3.2%)      490.92      (4.7%)   -2.1% (  -9% -    5%)
   BrowseMonthTaxoFacets     8450.33      (4.7%)     8354.07      (4.9%)   -1.1% ( -10% -    8%)
BrowseDayOfYearSSDVFacets     2016.73      (3.4%)     2009.96      (4.0%)   -0.3% (  -7% -    7%)
BrowseDayOfYearTaxoFacets     8303.67      (6.4%)     8294.91      (5.6%)   -0.1% ( -11% -   12%)
                  IntNRQ     1380.11      (2.1%)     1380.36      (2.0%)    0.0% (  -3% -    4%)
    BrowseDateTaxoFacets     3564.47      (3.2%)     3575.88      (3.2%)    0.3% (  -5% -    7%)
   BrowseMonthSSDVFacets     2247.87      (5.4%)     2276.28      (3.5%)    1.3% (  -7% -   10%)

java ...... -DFST.offheap=false

                    TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff
               LowPhrase     3244.01      (6.3%)     3201.30      (7.0%)   -1.3% ( -13% -   12%)
                PKLookup      171.24      (3.3%)      169.28      (5.3%)   -1.1% (  -9% -    7%)
         MedSloppyPhrase     2867.58      (6.3%)     2848.80      (6.9%)   -0.7% ( -13% -   13%)
   BrowseMonthTaxoFacets     8565.92      (4.9%)     8514.51      (5.3%)   -0.6% ( -10% -   10%)
                 Respell      529.20      (3.6%)      526.69      (3.4%)   -0.5% (  -7% -    6%)
                Wildcard     1252.25      (7.6%)     1249.97      (7.3%)   -0.2% ( -13% -   15%)
                  IntNRQ     1536.74      (1.7%)     1536.53      (2.1%)   -0.0% (  -3% -    3%)
BrowseDayOfYearTaxoFacets     8490.89      (6.3%)     8490.94      (5.5%)    0.0% ( -11% -   12%)
             LowSpanNear     2391.88      (3.0%)     2392.15      (4.9%)    0.0% (  -7% -    8%)
                 LowTerm    12382.95      (4.3%)    12384.63      (3.6%)    0.0% (  -7% -    8%)
       HighTermMonthSort     4906.65      (3.3%)     4910.32      (4.3%)    0.1% (  -7% -    7%)
             AndHighHigh     1652.60      (5.4%)     1660.85      (4.9%)    0.5% (  -9% -   11%)
BrowseDayOfYearSSDVFacets     2006.52      (4.5%)     2017.41      (3.3%)    0.5% (  -6% -    8%)
                  Fuzzy2      176.18      (4.7%)      177.27      (3.9%)    0.6% (  -7% -    9%)
             MedSpanNear     2668.05      (6.7%)     2688.05      (3.9%)    0.7% (  -9% -   12%)
                HighTerm     5556.40      (4.9%)     5611.56      (3.8%)    1.0% (  -7% -   10%)
              AndHighMed     2257.29      (4.7%)     2281.54      (4.0%)    1.1% (  -7% -   10%)
               OrHighMed     1611.93      (4.5%)     1631.79      (4.0%)    1.2% (  -6% -   10%)
    BrowseDateTaxoFacets     3521.57      (4.7%)     3565.96      (4.9%)    1.3% (  -7% -   11%)
                  Fuzzy1      634.59      (3.8%)      642.78      (5.8%)    1.3% (  -7% -   11%)
              AndHighLow     4739.69      (5.0%)     4813.65      (5.7%)    1.6% (  -8% -   12%)
   HighTermDayOfYearSort     1742.58      (5.5%)     1770.22      (5.7%)    1.6% (  -9% -   13%)
   BrowseMonthSSDVFacets     2235.20      (6.4%)     2271.85      (3.4%)    1.6% (  -7% -   12%)
         LowSloppyPhrase     3167.97      (6.6%)     3221.73      (7.1%)    1.7% ( -11% -   16%)
                 MedTerm    10275.01      (4.6%)    10450.43      (4.1%)    1.7% (  -6% -   10%)
                 Prefix3     1522.42      (8.9%)     1551.62      (9.9%)    1.9% ( -15% -   22%)
            HighSpanNear     1680.39      (5.6%)     1714.25      (5.0%)    2.0% (  -8% -   13%)
               MedPhrase     2963.75      (7.1%)     3039.31      (5.5%)    2.5% (  -9% -   16%)
              OrHighHigh     1312.39      (6.2%)     1347.33      (6.1%)    2.7% (  -9% -   16%)
               OrHighLow     1969.23      (5.9%)     2025.16      (4.4%)    2.8% (  -7% -   13%)
        HighSloppyPhrase     1256.32      (5.5%)     1296.12      (6.7%)    3.2% (  -8% -   16%)
              HighPhrase     2202.95      (7.6%)     2311.64      (5.7%)    4.9% (  -7% -   19%)

[Legacy Jira: Ankit Jain on Jan 27 2019]

mikemccand commented 5 years ago

Results for bigger data sets:

                    TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff
                PKLookup      117.59      (3.0%)      107.48      (2.3%)   -8.6% ( -13% -   -3%)
            OrHighNotMed     1085.05      (2.1%)     1056.43      (2.2%)   -2.6% (  -6% -    1%)
            OrNotHighLow      976.94      (2.4%)      955.32      (1.8%)   -2.2% (  -6% -    2%)
            OrHighNotLow     1152.58      (2.6%)     1128.25      (2.0%)   -2.1% (  -6% -    2%)
                  Fuzzy1       83.10      (2.6%)       81.54      (2.5%)   -1.9% (  -6% -    3%)
                  IntNRQ       88.53     (16.2%)       86.92     (14.7%)   -1.8% ( -28% -   34%)
           OrNotHighHigh      886.10      (1.7%)      870.26      (1.4%)   -1.8% (  -4% -    1%)
           OrHighNotHigh      838.32      (1.8%)      824.15      (1.9%)   -1.7% (  -5% -    2%)
   BrowseMonthTaxoFacets     8099.58      (2.0%)     7968.65      (1.8%)   -1.6% (  -5% -    2%)
                  Fuzzy2       55.95      (2.7%)       55.08      (2.5%)   -1.6% (  -6% -    3%)
            OrNotHighMed      764.40      (2.3%)      752.56      (1.7%)   -1.5% (  -5% -    2%)
BrowseDayOfYearTaxoFacets     8081.37      (2.1%)     7957.27      (2.7%)   -1.5% (  -6% -    3%)
                 LowTerm     1941.88      (5.2%)     1912.71      (4.0%)   -1.5% ( -10% -    8%)
       HighTermMonthSort       78.12     (10.8%)       76.99     (14.3%)   -1.4% ( -23% -   26%)
                 Respell       61.23      (2.7%)       60.57      (2.7%)   -1.1% (  -6% -    4%)
                HighTerm     1526.16      (3.1%)     1510.23      (1.8%)   -1.0% (  -5% -    4%)
                 MedTerm     1814.44      (3.7%)     1797.69      (2.1%)   -0.9% (  -6% -    5%)
               OrHighLow      443.93      (2.4%)      439.92      (2.5%)   -0.9% (  -5% -    4%)
              AndHighLow      577.60      (2.0%)      573.43      (1.4%)   -0.7% (  -4% -    2%)
                Wildcard       62.79      (5.8%)       62.54      (6.1%)   -0.4% ( -11% -   12%)
BrowseDayOfYearSSDVFacets       11.56      (8.0%)       11.55      (8.2%)   -0.0% ( -15% -   17%)
                 Prefix3      165.76      (8.7%)      165.70      (9.2%)   -0.0% ( -16% -   19%)
             MedSpanNear       51.40      (2.3%)       51.48      (2.5%)    0.2% (  -4% -    5%)
   BrowseMonthSSDVFacets       14.45     (13.6%)       14.47     (13.2%)    0.2% ( -23% -   31%)
   HighTermDayOfYearSort       44.98      (6.8%)       45.05      (5.3%)    0.2% ( -11% -   13%)
               OrHighMed      111.81      (3.0%)      112.01      (2.8%)    0.2% (  -5% -    6%)
             LowSpanNear       47.14      (2.4%)       47.24      (2.5%)    0.2% (  -4% -    5%)
         MedSloppyPhrase       48.25      (1.9%)       48.37      (2.3%)    0.2% (  -3% -    4%)
         LowSloppyPhrase       35.36      (2.2%)       35.46      (2.5%)    0.3% (  -4% -    5%)
              AndHighMed      144.05      (3.6%)      144.53      (2.7%)    0.3% (  -5% -    6%)
            HighSpanNear        6.92      (3.5%)        6.95      (3.5%)    0.5% (  -6% -    7%)
               MedPhrase       25.88      (2.4%)       26.00      (1.4%)    0.5% (  -3% -    4%)
             AndHighHigh       38.77      (4.0%)       38.98      (3.9%)    0.5% (  -7% -    8%)
              OrHighHigh       27.47      (3.2%)       27.63      (3.1%)    0.6% (  -5% -    7%)
               LowPhrase       91.71      (4.3%)       92.56      (3.5%)    0.9% (  -6% -    9%)
        HighSloppyPhrase       18.28      (3.2%)       18.45      (3.6%)    0.9% (  -5% -    8%)
              HighPhrase       20.07      (3.9%)       20.35      (1.3%)    1.4% (  -3% -    6%)
    BrowseDateTaxoFacets        2.37      (0.4%)        2.41      (0.2%)    1.4% (   0% -    2%)

[Legacy Jira: Ankit Jain on Jan 27 2019]

mikemccand commented 5 years ago

OK net/net it looks like there is a small performance impact for some queries, and biggish (-7-8%) impact for PKLookup.

But this is a nice option to have for users who are heap constrained by the FSTs, so I wonder how we could add this option off by default? E.g. users might want their id field to store the FST in heap (like today), but all other fields off-heap.

There is no index format change required here, which is nice, but Lucene doesn't make it easy to have read-time codec behavior changes, so maybe the solution is that at write-time we add an option e.g. to BlockTreeTermsWriter and it stores this in the index and then at read-time BlockTreeTermsReader checks that option and loads the FST accordingly? Then users could customize their codecs to achieve this.

Or I suppose we could add a global system property, e.g. our default stored fields writer has a property to turn on/off bulk merge, but I think we are trying not to use Java properties going forward?

Can anyone think of any other approaches to make this option possible?

[Legacy Jira: Michael McCandless (@mikemccand) on Jan 29 2019]

mikemccand commented 5 years ago

Given that the performance hit is mostly on PK lookups, maybe a starting point could be to always put the FST off-heap except when docCount == sumDocFreq, which suggests the field is an ID field.

[Legacy Jira: Adrien Grand (@jpountz) on Jan 29 2019]

mikemccand commented 5 years ago

Oooh I like that proposal @jpountz!

[Legacy Jira: Michael McCandless (@mikemccand) on Jan 29 2019]

mikemccand commented 5 years ago

Given that the performance hit is mostly on PK lookups, maybe a starting point could be to always put the FST off-heap except when docCount == sumDocFreq, which suggests the field is an ID field.

@jpountz - Does that exlude autogenerated id fields that are uuid, resulting in large FSTs? Elasticsearch for example has _id field, which IMO is better offheap.

[Legacy Jira: Ankit Jain on Jan 29 2019]

mikemccand commented 5 years ago

I posted my latest patch including off-heap change + FST reversal + reading index forward by wrapping IndexInput directly (no random access, and no bug with using slow skipBytes) – that's fst-offheap-rev.patch

[Legacy Jira: Michael Sokolov (@msokolov) on Jan 29 2019]

mikemccand commented 5 years ago

Does that exlude autogenerated id fields that are uuid, resulting in large FSTs? Elasticsearch for example has _id field, which IMO is better offheap.

No it doesn't exclude autogenerated ID fields. ID fields are tricky: they are indeed the ones that consume the most heap but also the ones that depend the most on term lookup performance.

[Legacy Jira: Adrien Grand (@jpountz) on Jan 30 2019]

mikemccand commented 5 years ago

I agree that would be a good start. Perhaps as a separate issue we can add finer per-field control of when to use on vs off-heap (per field, eg).

Just to look a little way down that path: It seems that the nearest thing to do this today is get/setPreload() and get/setUseUnmap in MMapDirectory, but here one really wants a mapping by field name, and a Directory should not really bne concerned with field names. Better would be an attribute of FieldInfo, where we have put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the appropriate On/OffHeapStore when creating its FST. What do you think?

[Legacy Jira: Michael Sokolov (@msokolov) on Jan 30 2019]

mikemccand commented 5 years ago

Given that reversing the index during write to make it forward reading didn't help the performance (in addition to it not being backward compatible), is the consensus to add exception for PK and directories other than mmap for offheap FST in ra.patch?

[Legacy Jira: Ankit Jain on Jan 31 2019]

mikemccand commented 5 years ago

Better would be an attribute of FieldInfo, where we have put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the appropriate On/OffHeapStore when creating its FST. What do you think?

Hmm that's also an interesting approach to get per-field control. One can set these attributes in a custom FieldType when indexing documents, or maybe in a custom codec at write time (just subclassing e.g. Lucene80Codec), or at read time using a real (named) custom codec. So we would pick a specific string (FST_OFF_HEAP or something) and define that as a string constant which users could then use for setting the attribute?

So ... maybe we have a default behavior w/ Adrien's cool idea, but then also allow the attribute to give per-field control? We should probably also by default (if the field attribute is not present) not do off-heap when the directory is not MMapDirectory? We haven't tested the other directory impls but I suspect they'd be quite a bit slower with off-heap FST?

Given that reversing the index during write to make it forward reading didn't help the performance (in addition to it not being backward compatible), is the consensus to add exception for PK and directories other than mmap for offheap FST in ra.patch?

Yeah +1 to keep the two changes separated.

[Legacy Jira: Michael McCandless (@mikemccand) on Feb 01 2019]

mikemccand commented 5 years ago

Yes, @akjain that approach sounds good to me; we should hold off on the FST-reversal. It didn't help here; the random-access approach worked just as well. Also, maybe opening a pull request will help, if only to distinguish it from all the patches that are cluttering this now (sorry!)

[Legacy Jira: Michael Sokolov (@msokolov) on Feb 01 2019]

mikemccand commented 5 years ago

I have created pull request with the proposed changes. Though surprisingly, I still see some impact on the PKLookup performance. This does not make sense to me, might be my perf run setup.

                    TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff
                PKLookup      117.45      (2.2%)      108.72      (2.3%)   -7.4% ( -11% -   -3%)
            OrHighNotMed     1094.23      (2.5%)     1057.88      (2.7%)   -3.3% (  -8% -    1%)
            OrHighNotLow     1047.30      (1.7%)     1012.91      (2.5%)   -3.3% (  -7% -    1%)
                  Fuzzy2       44.10      (2.3%)       42.71      (2.7%)   -3.2% (  -7% -    1%)
            OrNotHighLow     1022.67      (2.5%)      992.28      (2.4%)   -3.0% (  -7% -    1%)
BrowseDayOfYearTaxoFacets     7907.19      (2.0%)     7677.99      (2.7%)   -2.9% (  -7% -    1%)
            OrNotHighMed      866.37      (1.9%)      843.10      (2.3%)   -2.7% (  -6% -    1%)
                 LowTerm     2103.58      (3.5%)     2048.98      (3.6%)   -2.6% (  -9% -    4%)
   BrowseMonthTaxoFacets     7883.86      (2.0%)     7692.48      (2.1%)   -2.4% (  -6% -    1%)
                  Fuzzy1       64.44      (1.9%)       62.88      (2.3%)   -2.4% (  -6% -    1%)
           OrNotHighHigh      779.27      (2.0%)      761.04      (2.1%)   -2.3% (  -6% -    1%)
                 Respell       55.60      (2.6%)       54.34      (2.3%)   -2.3% (  -7% -    2%)
           OrHighNotHigh      877.28      (2.2%)      858.10      (2.5%)   -2.2% (  -6% -    2%)
   BrowseMonthSSDVFacets       14.85      (7.9%)       14.57     (10.7%)   -1.9% ( -18% -   18%)
                 MedTerm     1984.26      (3.6%)     1947.76      (2.3%)   -1.8% (  -7% -    4%)
              AndHighLow      718.71      (1.5%)      706.06      (1.6%)   -1.8% (  -4% -    1%)
               OrHighLow      523.40      (2.5%)      515.56      (2.4%)   -1.5% (  -6% -    3%)
                HighTerm     1381.10      (2.9%)     1360.80      (2.7%)   -1.5% (  -6% -    4%)
       HighTermMonthSort      120.45     (12.3%)      119.00     (16.4%)   -1.2% ( -26% -   31%)
BrowseDayOfYearSSDVFacets       11.55      (9.7%)       11.45     (10.0%)   -0.8% ( -18% -   20%)
              AndHighMed      155.15      (2.6%)      154.25      (2.4%)   -0.6% (  -5% -    4%)
               OrHighMed       88.00      (2.5%)       87.85      (2.7%)   -0.2% (  -5% -    5%)
               LowPhrase       80.53      (1.6%)       80.40      (1.4%)   -0.2% (  -3% -    2%)
             AndHighHigh       41.91      (4.2%)       41.86      (2.9%)   -0.1% (  -6% -    7%)
               MedPhrase       46.29      (1.4%)       46.33      (1.5%)    0.1% (  -2% -    3%)
                  IntNRQ      127.54      (0.4%)      127.76      (0.4%)    0.2% (   0% -    1%)
   HighTermDayOfYearSort       48.59      (5.1%)       48.71      (6.0%)    0.2% ( -10% -   12%)
         LowSloppyPhrase       13.04      (4.0%)       13.08      (4.3%)    0.3% (  -7% -    8%)
         MedSloppyPhrase       19.48      (2.3%)       19.54      (2.4%)    0.3% (  -4% -    5%)
              OrHighHigh       23.60      (3.0%)       23.68      (2.9%)    0.3% (  -5% -    6%)
              HighPhrase       20.25      (2.4%)       20.32      (1.8%)    0.3% (  -3% -    4%)
        HighSloppyPhrase        9.29      (3.3%)        9.32      (3.2%)    0.4% (  -5% -    7%)
             LowSpanNear       25.70      (3.8%)       25.89      (3.9%)    0.7% (  -6% -    8%)
             MedSpanNear       30.46      (4.1%)       30.69      (4.3%)    0.7% (  -7% -    9%)
            HighSpanNear       14.41      (4.3%)       14.60      (4.7%)    1.3% (  -7% -   10%)
                Wildcard       70.08     (10.3%)       71.09      (6.1%)    1.4% ( -13% -   19%)
    BrowseDateTaxoFacets        2.37      (0.2%)        2.41      (0.3%)    1.5% (   0% -    1%)
                 Prefix3       86.71     (11.4%)       89.04      (6.8%)    2.7% ( -13% -   23%)

[Legacy Jira: Ankit Jain on Feb 04 2019]

mikemccand commented 5 years ago

@akjain that's strange yeah – this patch was supposed to avoid kicking in for PK fields right?

[Legacy Jira: Michael Sokolov (@msokolov) on Feb 07 2019]

mikemccand commented 5 years ago

Ankit Jain that's strange yeah – this patch was supposed to avoid kicking in for PK fields right?

@msokolov - Yeah, not sure what's going on. Will be great if someone can review the changes, in case I missed something.

[Legacy Jira: Ankit Jain on Feb 09 2019]

mikemccand commented 5 years ago

I added print statements while running the benchmarks, and the classification looks correct:

Initializing field offheap start=55 field=Date.taxonomy
Initializing field offheap start=76 field=DayOfYear.sortedset
Initializing field offheap start=97 field=Month.sortedset
Initializing field offheap start=118 field=body
Initializing field onheap start=267 field=date
Initializing field onheap start=289 field=groupend
Initializing field onheap start=311 field=id
Initializing field onheap start=333 field=title

Though, when I restricted tests to PKLookups only using comp.addTaskPattern('PKLookup') in localrun.py, results look as expected:


TaskQPS       baseline      StdDevQPS     candidate       StdDev       Pct diff                             
PKLookup      163.29            (1.6%)      164.80      (2.1%)       0.9% (-2% - 4%)


TaskQPS      baseline      StdDevQPS      candidate    StdDev     Pct diff
PKLookup      114.29            (1.7%)     114.73       (1.2%)     0.4% ( -2% - 3%)

It seems we are good with this change then.

[Legacy Jira: Ankit Jain on Feb 10 2019]

mikemccand commented 5 years ago

I ran luceneutil on wikimediumall with current trunk vs PR here – net/net looks like noise, which is great – I'll push shortly:

Report after iter 19:

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff

                 Prefix3       37.05     (11.4%)       36.25     (13.0%)   -2.1% ( -23% -   25%)
   BrowseMonthSSDVFacets        5.01      (6.4%)        4.91     (10.4%)   -1.9% ( -17% -   15%)
   BrowseMonthTaxoFacets        1.24      (2.7%)        1.22      (4.8%)   -1.3% (  -8% -    6%)
                Wildcard      106.53      (8.6%)      105.18      (9.1%)   -1.3% ( -17% -   18%)
   HighTermDayOfYearSort       14.85      (4.2%)       14.70      (4.2%)   -1.0% (  -9% -    7%)
    BrowseDateTaxoFacets        1.11      (3.2%)        1.10      (5.6%)   -0.8% (  -9% -    8%)
BrowseDayOfYearTaxoFacets        1.11      (3.1%)        1.10      (5.6%)   -0.8% (  -9% -    8%)
         MedSloppyPhrase        4.59      (3.4%)        4.56      (2.8%)   -0.5% (  -6% -    5%)
                  Fuzzy2       68.49      (1.0%)       68.12      (1.3%)   -0.5% (  -2% -    1%)
             LowSpanNear       30.34      (1.7%)       30.19      (1.9%)   -0.5% (  -4% -    3%)
                  Fuzzy1       72.43      (0.9%)       72.10      (1.4%)   -0.5% (  -2% -    1%)
               LowPhrase       34.35      (1.1%)       34.22      (2.0%)   -0.4% (  -3% -    2%)
                 Respell       47.66      (1.4%)       47.48      (1.7%)   -0.4% (  -3% -    2%)
         LowSloppyPhrase       10.59      (4.9%)       10.56      (3.6%)   -0.3% (  -8% -    8%)
                HighTerm     1290.39      (1.8%)     1286.15      (1.4%)   -0.3% (  -3% -    2%)
                 MedTerm     1419.25      (2.0%)     1415.23      (1.5%)   -0.3% (  -3% -    3%)
                  IntNRQ       27.03     (11.0%)       26.96     (10.9%)   -0.3% ( -19% -   24%)
        HighSloppyPhrase        6.73      (4.9%)        6.71      (3.4%)   -0.3% (  -8% -    8%)
           OrNotHighHigh      825.79      (1.9%)      823.77      (1.4%)   -0.2% (  -3% -    3%)
            OrNotHighMed      912.80      (1.3%)      910.96      (1.3%)   -0.2% (  -2% -    2%)
               MedPhrase       29.52      (1.1%)       29.46      (1.9%)   -0.2% (  -3% -    2%)
            OrHighNotLow     1184.54      (3.1%)     1182.86      (1.8%)   -0.1% (  -4% -    4%)
                 LowTerm      974.30      (1.5%)      973.33      (1.4%)   -0.1% (  -2% -    2%)
               OrHighLow      328.39      (1.0%)      328.13      (1.0%)   -0.1% (  -2% -    1%)
             AndHighHigh       21.04      (2.8%)       21.03      (2.6%)   -0.1% (  -5% -    5%)
           OrHighNotHigh      907.78      (1.8%)      907.93      (1.4%)    0.0% (  -3% -    3%)
            OrHighNotMed     1019.49      (2.0%)     1019.67      (1.4%)    0.0% (  -3% -    3%)
              AndHighMed       64.27      (1.1%)       64.33      (1.1%)    0.1% (  -2% -    2%)
            OrNotHighLow      414.78      (1.2%)      415.43      (1.0%)    0.2% (  -2% -    2%)
BrowseDayOfYearSSDVFacets        4.14      (6.9%)        4.15      (8.9%)    0.2% ( -14% -   17%)
              AndHighLow      371.09      (1.7%)      371.84      (1.7%)    0.2% (  -3% -    3%)
               OrHighMed       65.31      (1.8%)       65.45      (1.8%)    0.2% (  -3% -    3%)
                PKLookup      141.21      (1.6%)      141.63      (1.9%)    0.3% (  -3% -    3%)
            HighSpanNear       25.84      (2.8%)       25.94      (2.6%)    0.4% (  -4% -    5%)
             MedSpanNear       26.39      (2.9%)       26.50      (2.8%)    0.4% (  -5% -    6%)
              HighPhrase       11.72      (2.1%)       11.77      (1.9%)    0.4% (  -3% -    4%)
              OrHighHigh       14.60      (2.2%)       14.69      (1.8%)    0.6% (  -3% -    4%)
       HighTermMonthSort       31.51      (6.0%)       31.90      (6.0%)    1.2% ( -10% -   14%)

[Legacy Jira: Michael McCandless (@mikemccand) on Feb 19 2019]

mikemccand commented 5 years ago

Commit ec801b4c54194dc0d4893d227e2f2c9580c04ec6 in lucene-solr's branch refs/heads/master from Michael McCandless https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ec801b4

LUCENE-8635: add option to move FSTs off-heap, and do so for the FST terms index in the default codec for non-primary-key fields if MMapDirectory is being used

[Legacy Jira: ASF subversion and git services on Feb 19 2019]

mikemccand commented 5 years ago

Commit 10d5e935e22256670940f33b96229cdb8da9f6a8 in lucene-solr's branch refs/heads/branch_8x from Michael McCandless https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=10d5e93

LUCENE-8635: add option to move FSTs off-heap, and do so for the FST terms index in the default codec for non-primary-key fields if MMapDirectory is being used

[Legacy Jira: ASF subversion and git services on Feb 19 2019]

mikemccand commented 5 years ago

Commit 7b93dd5aa5016e4e4365b97439f406bc86cab451 in lucene-solr's branch refs/heads/branch_8_0 from Michael McCandless https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7b93dd5

LUCENE-8635: add option to move FSTs off-heap, and do so for the FST terms index in the default codec for non-primary-key fields if MMapDirectory is being used

[Legacy Jira: ASF subversion and git services on Feb 19 2019]

mikemccand commented 5 years ago

Thanks @akjain!

[Legacy Jira: Michael McCandless (@mikemccand) on Feb 19 2019]

mikemccand / stargazers-migration-test

Lazy loading Lucene FST offheap using mmap [LUCENE-8635] #634

Legacy Jira details