mikemccand / luceneutil

Various utility scripts for running Lucene performance tests
Apache License 2.0
205 stars 115 forks source link

Wrap directory reader by ExitableDirectoryReader when 'exitable' parameter is passed. #172

Closed mocobeta closed 2 years ago

mocobeta commented 2 years ago

168

Instead of timeout parameter (#171), I added exitable boolean option this time. When exitable=True, the directory reader is wrapped by ExitableDirectoryReader, where the timeout is -1 (this is interpreted to Long.MAX_VALUE in QueryTimeoutImpl constructor).

Usage (localrun.py)

  # baseline
  comp.competitor('baseline', 'lucene_baseline',
                  index = index, concurrentSearches = concurrentSearches)
  # with excitable=True
  comp.competitor('exitable_directory_reader', 'lucene_candidate',
                  index = index, exitable = True, concurrentSearches = concurrentSearches)

A sampled result (with -source wikimedium1m)

                            TaskQPS baseline      StdDevQPS exitable_directory_reader      StdDev                Pct diff p-value
           BrowseMonthTaxoFacets      166.54     (13.2%)       80.06     (17.7%)  -51.9% ( -73% -  -24%) 0.000
           BrowseMonthSSDVFacets      212.71     (12.3%)      106.19      (6.2%)  -50.1% ( -61% -  -35%) 0.000
       BrowseDayOfYearTaxoFacets      161.60      (6.9%)       87.30     (23.4%)  -46.0% ( -71% -  -16%) 0.000
            BrowseDateTaxoFacets      161.47      (7.5%)       87.40     (23.3%)  -45.9% ( -71% -  -16%) 0.000
     BrowseRandomLabelTaxoFacets      149.75      (7.6%)       82.14     (22.3%)  -45.2% ( -69% -  -16%) 0.000
       BrowseDayOfYearSSDVFacets      176.01      (7.6%)      101.19      (8.9%)  -42.5% ( -54% -  -28%) 0.000
     BrowseRandomLabelSSDVFacets      128.67      (7.9%)       79.72      (6.5%)  -38.0% ( -48% -  -25%) 0.000
            BrowseDateSSDVFacets       27.56     (20.2%)       23.89     (14.2%)  -13.3% ( -39% -   26%) 0.016
                          IntNRQ      762.70      (9.1%)      671.27     (13.3%)  -12.0% ( -31% -   11%) 0.001
                         MedTerm     3773.98      (5.6%)     3657.57      (5.6%)   -3.1% ( -13% -    8%) 0.082
                        HighTerm     2348.81      (3.8%)     2315.86      (5.9%)   -1.4% ( -10% -    8%) 0.373
                      AndHighLow     2417.38      (4.7%)     2399.10      (6.2%)   -0.8% ( -11% -   10%) 0.664
                HighSloppyPhrase      151.03      (3.4%)      150.34      (2.7%)   -0.5% (  -6% -    5%) 0.644
                 MedSloppyPhrase      367.04      (2.6%)      365.40      (2.1%)   -0.4% (  -4% -    4%) 0.543
                      AndHighMed      710.88      (4.8%)      707.91      (3.3%)   -0.4% (  -8% -    7%) 0.746
                        Wildcard      470.68      (5.0%)      469.48      (4.4%)   -0.3% (  -9% -    9%) 0.864
                 LowSloppyPhrase       96.98      (4.1%)       97.04      (3.0%)    0.1% (  -6% -    7%) 0.951
                     MedSpanNear      280.59      (3.9%)      281.21      (2.6%)    0.2% (  -5% -    6%) 0.830
             LowIntervalsOrdered      533.57      (4.4%)      534.99      (5.2%)    0.3% (  -8% -   10%) 0.862
                      OrHighHigh      290.28      (5.2%)      291.10      (5.7%)    0.3% ( -10% -   11%) 0.869
                          Fuzzy1      190.59      (2.7%)      191.32      (2.9%)    0.4% (  -5% -    6%) 0.665
                    HighSpanNear      269.70      (4.2%)      270.84      (4.4%)    0.4% (  -7% -    9%) 0.757
                     AndHighHigh      315.91      (5.3%)      317.67      (4.4%)    0.6% (  -8% -   10%) 0.715
             MedIntervalsOrdered      275.63      (4.3%)      277.20      (4.4%)    0.6% (  -7% -    9%) 0.678
                         LowTerm     3325.93      (6.2%)     3350.55      (5.1%)    0.7% (  -9% -   12%) 0.681
                       OrHighLow      707.49      (5.8%)      713.94      (5.8%)    0.9% ( -10% -   13%) 0.620
                          Fuzzy2       96.13      (2.9%)       97.05      (2.9%)    1.0% (  -4% -    6%) 0.295
                       MedPhrase      182.56      (3.2%)      185.06      (3.4%)    1.4% (  -5% -    8%) 0.186
                       LowPhrase      536.70      (3.2%)      544.63      (2.7%)    1.5% (  -4% -    7%) 0.113
                         Respell      188.49      (2.9%)      191.27      (3.1%)    1.5% (  -4% -    7%) 0.121
                        PKLookup      263.16      (7.7%)      267.12      (6.4%)    1.5% ( -11% -   16%) 0.503
                     LowSpanNear      676.21      (5.0%)      686.90      (4.5%)    1.6% (  -7% -   11%) 0.294
                       OrHighMed      626.87      (3.2%)      636.92      (4.6%)    1.6% (  -6% -    9%) 0.202
                      HighPhrase      415.16      (2.8%)      422.35      (3.6%)    1.7% (  -4% -    8%) 0.090
            HighIntervalsOrdered       39.17      (8.4%)       40.03      (9.2%)    2.2% ( -14% -   21%) 0.433
               HighTermMonthSort      565.52     (20.0%)      581.13     (23.9%)    2.8% ( -34% -   58%) 0.692
                         Prefix3     1371.42      (5.4%)     1411.78      (5.3%)    2.9% (  -7% -   14%) 0.082
           HighTermDayOfYearSort     1200.07     (11.6%)     1250.16     (16.7%)    4.2% ( -21% -   36%) 0.358
jpountz commented 2 years ago

exitable makes sense to me as there isn't much we can do when queries hit a timeout from a benchmark perspective

mikemccand commented 2 years ago

TaskQPS baseline StdDevQPS exitable_directory_reader StdDev Pct diff p-value BrowseMonthTaxoFacets 166.54 (13.2%) 80.06 (17.7%) -51.9% ( -73% - -24%) 0.000

Egads! Maybe open a followon issue in Lucene Jira (or GitHub issues soon maybe!) to improve performance of ExitableDirectoryReader for these pure-browse cases? That's far more impact than I expectd!

mikemccand commented 2 years ago

and I'll add you as committer on this project too.

Aha! You are already :) Thanks!

mocobeta commented 2 years ago

Thanks for merging, I'll look at the javascript part - this will be much harder for me.

mikemccand commented 2 years ago

I'll look at the javascript part - this will be much harder for me.

Hmm what is the javascript part?

mocobeta commented 2 years ago

I'll look at the javascript part - this will be much harder for me.

Hmm what is the javascript part?

I guess we need a graph in https://home.apache.org/~mikemccand/lucenebench/ to make it vital and some javascript is needed for that? But maybe I'm missing something...

mikemccand commented 2 years ago

Oh! Yes that would be wonderful, and is indeed trickier. I suggest opening a new issue to "Measure ExitableDirectoryReader overhead in nightly benchmarks"?