mikemccand / luceneutil

Various utility scripts for running Lucene performance tests
Apache License 2.0
190 stars 106 forks source link

Report recall for vector search in nightly benchmarks #278

Open tteofili opened 1 week ago

tteofili commented 1 week ago

since we already measure recall as well as QPS as part of knnPerfTest.py, I am thinking that it would be beneficial for Lucene to add recall@k to the results of nightly benchmarks (for Wikipedia and/or Cohere embeddings) so that we could discover potential regressions in accuracy earlier (see example regression being reported here) .

related to #155 .

mikemccand commented 1 week ago

+1, that's a great idea. The nightly benchy currently does not run knnPerfTest.py but rather the VectorSearch tasks (KnnFloatVectorQuery). So we could either try to add recall to SearchTask.java where it verifies hits are the same across the base / comp runs, or, (easier probably?) we could invoke knnPerfTest.py as part of the nightly script. Though we'd then need to add some simple charting to plot the recall metric over time.