mikemccand / luceneutil

Various utility scripts for running Lucene performance tests
Apache License 2.0
205 stars 115 forks source link

Allow passing Baseline and Candidate paths via script args #231

Closed vigyasharma closed 1 year ago

vigyasharma commented 1 year ago

In its current form, luceneutil expects a directory structure with baseline repo present in $BENCH_DIR/lucene_baseline and candidate repo in $BENCH_DIR/lucene_candidate.

While a lot of existing users already work with this setup, it adds some additional steps in certain scenarios like when your candidate is already checkout out in a different location, or when candidate is a different upstream branch on the baseline repo (e.g. fetching and benchmarking a PR locally).

This change leverages the argparser support added in #230 and adds arguments for baseline and candidate repo paths to the benchmark script. Both arguments are optional. If skipped, the benchmark script picks code from $BENCH_DIR/lucene_baseline and $BENCH_DIR/lucene_candidate respectively.

It also adds a -r / --reindex flag that recreates the candidate index when passed, which is useful when benchmarking an indexing side change.

The existing behavior is retained by default.

usage: Local Benchmark Run [-h] [-s SOURCE] [-concurrentSearches] [-b BASELINE] [-c CANDIDATE] [-r]

Run a local benchmark on provided source dataset.

options:
  -h, --help            show this help message and exit
  -s SOURCE, -source SOURCE, --source SOURCE
                        Data source to run the benchmark on.
  -concurrentSearches, --concurrentSearches
                        Run concurrent searches
  -b BASELINE, --baseline BASELINE
                        Path to lucene repo to be used for baseline
  -c CANDIDATE, --candidate CANDIDATE
                        Path to lucene repo to be used for candidate
  -r, --reindex         Reindex data for candidate run

Testing

With new params

python3 src/python/example.py -source wikimedium10k -b /Users/vigyas/repos/lucene -c /Users/vigyas/forks/lucene -r
WARNING: Gnuplot module not present; will not make charts
Running benchmarks with the following args: Namespace(source='wikimedium10k', concurrentSearches=False, baseline='/Users/vigyas/repos/lucene', candidate='/Users/vigyas/forks/lucene', reindex=True)
Using checkout:[/Users/vigyas/repos/lucene] for competitor:[baseline]
Using checkout:[/Users/vigyas/forks/lucene] for competitor:[my_modified_version]
...

  iter 0
    my_modified_version:
    # -classpath /Users/vigyas/forks/lucene/lucene/core/build/libs/lucene-core-10.0.0-SNAPSHOT.jar
    # -indexPath /Users/vigyas/lucenebench/indices/wikimedium10k.lucene.candidate.facets.taxonomy:Date.taxonomy:Month.taxonomy:DayOfYear.sortedset:Date.sortedset:Month.sortedset:DayOfYear.taxonomy:RandomLabel.sortedset:RandomLabel.Lucene90.Lucene90.dvfields.nd0.01M
      log: /Users/vigyas/lucenebench/logs/baseline_vs_patch.my_modified_version.0 + stdout
      run: java -server -Xms2g -Xmx2g --add-modules jdk.incubator.vector -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParallelGC -XX:StartFlightRecording=dumponexit=true,maxsize=250M,settings=/Users/vigyas/lucenebench/util/src/python/profiling.jfc,filename=/Users/vigyas/lucenebench/logs/bench-search-baseline_vs_patch-my_modified_version-0.jfr -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -classpath /Users/vigyas/forks/lucene/lucene/core/build/libs/lucene-core-10.0.0-SNAPSHOT.jar:/Users/vigyas/forks/lucene/lucene/sandbox/build/classes/java/main:/Users/vigyas/forks/lucene/lucene/misc/build/classes/java/main:/Users/vigyas/forks/lucene/lucene/facet/build/classes/java/main:/Users/vigyas/forks/lucene/lucene/analysis/common/build/classes/java/main:/Users/vigyas/forks/lucene/lucene/analysis/icu/build/classes/java/main:/Users/vigyas/forks/lucene/lucene/queryparser/build/classes/java/main:/Users/vigyas/forks/lucene/lucene/grouping/build/classes/java/main:/Users/vigyas/forks/lucene/lucene/suggest/build/classes/java/main:/Users/vigyas/forks/lucene/lucene/highlighter/build/classes/java/main:/Users/vigyas/forks/lucene/lucene/codecs/build/classes/java/main:/Users/vigyas/forks/lucene/lucene/queries/build/classes/java/main:/Users/vigyas/.gradle/caches/modules-2/files-2.1/com.carrotsearch/hppc/0.9.1/4bf4c51e06aec600894d841c4c004566b20dd357/hppc-0.9.1.jar:/Users/vigyas/lucenebench/util/lib/HdrHistogram.jar:/Users/vigyas/lucenebench/util/build perf.SearchPerfTest -dirImpl MMapDirectory -indexPath /Users/vigyas/lucenebench/indices/wikimedium10k.lucene.candidate.facets.taxonomy:Date.taxonomy:Month.taxonomy:DayOfYear.sortedset:Date.sortedset:Month.sortedset:DayOfYear.taxonomy:RandomLabel.sortedset:RandomLabel.Lucene90.Lucene90.dvfields.nd0.01M -facets taxonomy:Date;Date -facets taxonomy:Month;Month -facets taxonomy:DayOfYear;DayOfYear -facets sortedset:Date;Date -facets sortedset:Month;Month -facets sortedset:DayOfYear;DayOfYear -facets taxonomy:RandomLabel;RandomLabel -facets sortedset:RandomLabel;RandomLabel -analyzer StandardAnalyzer -taskSource /Users/vigyas/lucenebench/util/tasks/wikimedium.1M.nostopwords.tasks -searchThreadCount 2 -taskRepeatCount 20 -field body -tasksPerCat 1 -staticSeed -3517517 -seed -8188330 -similarity BM25Similarity -commit multi -hiliteImpl FastVectorHighlighter -log /Users/vigyas/lucenebench/logs/baseline_vs_patch.my_modified_version.0 -topN 100 -pk
      2.4 s
    baseline:
    # -classpath /Users/vigyas/repos/lucene/lucene/core/build/libs/lucene-core-10.0.0-SNAPSHOT.jar
    # -indexPath /Users/vigyas/lucenebench/indices/wikimedium10k.lucene.facets.taxonomy:Date.taxonomy:Month.taxonomy:DayOfYear.sortedset:Date.sortedset:Month.sortedset:DayOfYear.taxonomy:RandomLabel.sortedset:RandomLabel.Lucene90.Lucene90.dvfields.nd0.01M
      log: /Users/vigyas/lucenebench/logs/baseline_vs_patch.baseline.0 + stdout
      run: java -server -Xms2g -Xmx2g --add-modules jdk.incubator.vector -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParallelGC -XX:StartFlightRecording=dumponexit=true,maxsize=250M,settings=/Users/vigyas/lucenebench/util/src/python/profiling.jfc,filename=/Users/vigyas/lucenebench/logs/bench-search-baseline_vs_patch-baseline-0.jfr -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -classpath /Users/vigyas/repos/lucene/lucene/core/build/libs/lucene-core-10.0.0-SNAPSHOT.jar:/Users/vigyas/repos/lucene/lucene/sandbox/build/classes/java/main:/Users/vigyas/repos/lucene/lucene/misc/build/classes/java/main:/Users/vigyas/repos/lucene/lucene/facet/build/classes/java/main:/Users/vigyas/repos/lucene/lucene/analysis/common/build/classes/java/main:/Users/vigyas/repos/lucene/lucene/analysis/icu/build/classes/java/main:/Users/vigyas/repos/lucene/lucene/queryparser/build/classes/java/main:/Users/vigyas/repos/lucene/lucene/grouping/build/classes/java/main:/Users/vigyas/repos/lucene/lucene/suggest/build/classes/java/main:/Users/vigyas/repos/lucene/lucene/highlighter/build/classes/java/main:/Users/vigyas/repos/lucene/lucene/codecs/build/classes/java/main:/Users/vigyas/repos/lucene/lucene/queries/build/classes/java/main:/Users/vigyas/.gradle/caches/modules-2/files-2.1/com.carrotsearch/hppc/0.9.1/4bf4c51e06aec600894d841c4c004566b20dd357/hppc-0.9.1.jar:/Users/vigyas/lucenebench/util/lib/HdrHistogram.jar:/Users/vigyas/lucenebench/util/build perf.SearchPerfTest -dirImpl MMapDirectory -indexPath /Users/vigyas/lucenebench/indices/wikimedium10k.lucene.facets.taxonomy:Date.taxonomy:Month.taxonomy:DayOfYear.sortedset:Date.sortedset:Month.sortedset:DayOfYear.taxonomy:RandomLabel.sortedset:RandomLabel.Lucene90.Lucene90.dvfields.nd0.01M -facets taxonomy:Date;Date -facets taxonomy:Month;Month -facets taxonomy:DayOfYear;DayOfYear -facets sortedset:Date;Date -facets sortedset:Month;Month -facets sortedset:DayOfYear;DayOfYear -facets taxonomy:RandomLabel;RandomLabel -facets sortedset:RandomLabel;RandomLabel -analyzer StandardAnalyzer -taskSource /Users/vigyas/lucenebench/util/tasks/wikimedium.1M.nostopwords.tasks -searchThreadCount 2 -taskRepeatCount 20 -field body -tasksPerCat 1 -staticSeed -3517517 -seed -8188330 -similarity BM25Similarity -commit multi -hiliteImpl FastVectorHighlighter -log /Users/vigyas/lucenebench/logs/baseline_vs_patch.baseline.0 -topN 100 -pk
      2.5 s

% ls ../indices 
wikimedium10k.lucene.candidate.facets.taxonomy:Date.taxonomy:Month.taxonomy:DayOfYear.sortedset:Date.sortedset:Month.sortedset:DayOfYear.taxonomy:RandomLabel.sortedset:RandomLabel.Lucene90.Lucene90.dvfields.nd0.01M/
wikimedium10k.lucene.facets.taxonomy:Date.taxonomy:Month.taxonomy:DayOfYear.sortedset:Date.sortedset:Month.sortedset:DayOfYear.taxonomy:RandomLabel.sortedset:RandomLabel.Lucene90.Lucene90.dvfields.nd0.01M/

% ls ../indices | wc -l
       2

With default params

% ls ../indices 
% python3 src/python/example.py -source wikimedium10k
WARNING: Gnuplot module not present; will not make charts
Running benchmarks with the following args: Namespace(source='wikimedium10k', concurrentSearches=False, baseline='lucene_baseline', candidate='lucene_candidate', reindex=False)
Using checkout:[/Users/vigyas/lucenebench/lucene_baseline] for competitor:[baseline]
Using checkout:[/Users/vigyas/lucenebench/lucene_candidate] for competitor:[my_modified_version]

  iter 19
    my_modified_version:
    # -classpath /Users/vigyas/lucenebench/lucene_candidate/lucene/core/build/libs/lucene-core-10.0.0-SNAPSHOT.jar
      log: /Users/vigyas/lucenebench/logs/baseline_vs_patch.my_modified_version.19 + stdout
      run: java -server -Xms2g -Xmx2g --add-modules jdk.incubator.vector -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParallelGC -XX:StartFlightRecording=dumponexit=true,maxsize=250M,settings=/Users/vigyas/lucenebench/util/src/python/profiling.jfc,filename=/Users/vigyas/lucenebench/logs/bench-search-baseline_vs_patch-my_modified_version-19.jfr -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -classpath /Users/vigyas/lucenebench/lucene_candidate/lucene/core/build/libs/lucene-core-10.0.0-SNAPSHOT.jar:/Users/vigyas/lucenebench/lucene_candidate/lucene/sandbox/build/classes/java/main:/Users/vigyas/lucenebench/lucene_candidate/lucene/misc/build/classes/java/main:/Users/vigyas/lucenebench/lucene_candidate/lucene/facet/build/classes/java/main:/Users/vigyas/lucenebench/lucene_candidate/lucene/analysis/common/build/classes/java/main:/Users/vigyas/lucenebench/lucene_candidate/lucene/analysis/icu/build/classes/java/main:/Users/vigyas/lucenebench/lucene_candidate/lucene/queryparser/build/classes/java/main:/Users/vigyas/lucenebench/lucene_candidate/lucene/grouping/build/classes/java/main:/Users/vigyas/lucenebench/lucene_candidate/lucene/suggest/build/classes/java/main:/Users/vigyas/lucenebench/lucene_candidate/lucene/highlighter/build/classes/java/main:/Users/vigyas/lucenebench/lucene_candidate/lucene/codecs/build/classes/java/main:/Users/vigyas/lucenebench/lucene_candidate/lucene/queries/build/classes/java/main:/Users/vigyas/.gradle/caches/modules-2/files-2.1/com.carrotsearch/hppc/0.9.1/4bf4c51e06aec600894d841c4c004566b20dd357/hppc-0.9.1.jar:/Users/vigyas/lucenebench/util/lib/HdrHistogram.jar:/Users/vigyas/lucenebench/util/build perf.SearchPerfTest -dirImpl MMapDirectory -indexPath /Users/vigyas/lucenebench/indices/wikimedium10k.lucene_baseline.facets.taxonomy:Date.taxonomy:Month.taxonomy:DayOfYear.sortedset:Date.sortedset:Month.sortedset:DayOfYear.taxonomy:RandomLabel.sortedset:RandomLabel.Lucene90.Lucene90.dvfields.nd0.01M -facets taxonomy:Date;Date -facets taxonomy:Month;Month -facets taxonomy:DayOfYear;DayOfYear -facets sortedset:Date;Date -facets sortedset:Month;Month -facets sortedset:DayOfYear;DayOfYear -facets taxonomy:RandomLabel;RandomLabel -facets sortedset:RandomLabel;RandomLabel -analyzer StandardAnalyzer -taskSource /Users/vigyas/lucenebench/util/tasks/wikimedium.1M.nostopwords.tasks -searchThreadCount 2 -taskRepeatCount 20 -field body -tasksPerCat 1 -staticSeed -6218524 -seed -171433 -similarity BM25Similarity -commit multi -hiliteImpl FastVectorHighlighter -log /Users/vigyas/lucenebench/logs/baseline_vs_patch.my_modified_version.19 -topN 100 -pk
      2.2 s
    baseline:
    # -classpath /Users/vigyas/lucenebench/lucene_baseline/lucene/core/build/libs/lucene-core-10.0.0-SNAPSHOT.jar
      log: /Users/vigyas/lucenebench/logs/baseline_vs_patch.baseline.19 + stdout
      run: java -server -Xms2g -Xmx2g --add-modules jdk.incubator.vector -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParallelGC -XX:StartFlightRecording=dumponexit=true,maxsize=250M,settings=/Users/vigyas/lucenebench/util/src/python/profiling.jfc,filename=/Users/vigyas/lucenebench/logs/bench-search-baseline_vs_patch-baseline-19.jfr -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -classpath /Users/vigyas/lucenebench/lucene_baseline/lucene/core/build/libs/lucene-core-10.0.0-SNAPSHOT.jar:/Users/vigyas/lucenebench/lucene_baseline/lucene/sandbox/build/classes/java/main:/Users/vigyas/lucenebench/lucene_baseline/lucene/misc/build/classes/java/main:/Users/vigyas/lucenebench/lucene_baseline/lucene/facet/build/classes/java/main:/Users/vigyas/lucenebench/lucene_baseline/lucene/analysis/common/build/classes/java/main:/Users/vigyas/lucenebench/lucene_baseline/lucene/analysis/icu/build/classes/java/main:/Users/vigyas/lucenebench/lucene_baseline/lucene/queryparser/build/classes/java/main:/Users/vigyas/lucenebench/lucene_baseline/lucene/grouping/build/classes/java/main:/Users/vigyas/lucenebench/lucene_baseline/lucene/suggest/build/classes/java/main:/Users/vigyas/lucenebench/lucene_baseline/lucene/highlighter/build/classes/java/main:/Users/vigyas/lucenebench/lucene_baseline/lucene/codecs/build/classes/java/main:/Users/vigyas/lucenebench/lucene_baseline/lucene/queries/build/classes/java/main:/Users/vigyas/.gradle/caches/modules-2/files-2.1/com.carrotsearch/hppc/0.9.1/4bf4c51e06aec600894d841c4c004566b20dd357/hppc-0.9.1.jar:/Users/vigyas/lucenebench/util/lib/HdrHistogram.jar:/Users/vigyas/lucenebench/util/build perf.SearchPerfTest -dirImpl MMapDirectory -indexPath /Users/vigyas/lucenebench/indices/wikimedium10k.lucene_baseline.facets.taxonomy:Date.taxonomy:Month.taxonomy:DayOfYear.sortedset:Date.sortedset:Month.sortedset:DayOfYear.taxonomy:RandomLabel.sortedset:RandomLabel.Lucene90.Lucene90.dvfields.nd0.01M -facets taxonomy:Date;Date -facets taxonomy:Month;Month -facets taxonomy:DayOfYear;DayOfYear -facets sortedset:Date;Date -facets sortedset:Month;Month -facets sortedset:DayOfYear;DayOfYear -facets taxonomy:RandomLabel;RandomLabel -facets sortedset:RandomLabel;RandomLabel -analyzer StandardAnalyzer -taskSource /Users/vigyas/lucenebench/util/tasks/wikimedium.1M.nostopwords.tasks -searchThreadCount 2 -taskRepeatCount 20 -field body -tasksPerCat 1 -staticSeed -6218524 -seed -171433 -similarity BM25Similarity -commit multi -hiliteImpl FastVectorHighlighter -log /Users/vigyas/lucenebench/logs/baseline_vs_patch.baseline.19 -topN 100 -pk
      2.1 s

% ls ../indices 
wikimedium10k.lucene_baseline.facets.taxonomy:Date.taxonomy:Month.taxonomy:DayOfYear.sortedset:Date.sortedset:Month.sortedset:DayOfYear.taxonomy:RandomLabel.sortedset:RandomLabel.Lucene90.Lucene90.dvfields.nd0.01M/

% ls ../indices | wc -l
       1