Open tteofili opened 5 months ago
+1, that's a great idea. The nightly benchy currently does not run knnPerfTest.py
but rather the VectorSearch
tasks (KnnFloatVectorQuery
). So we could either try to add recall to SearchTask.java
where it verifies hits are the same across the base / comp runs, or, (easier probably?) we could invoke knnPerfTest.py
as part of the nightly script. Though we'd then need to add some simple charting to plot the recall metric over time.
since we already measure recall as well as QPS as part of
knnPerfTest.py
, I am thinking that it would be beneficial for Lucene to add recall@k to the results of nightly benchmarks (for Wikipedia and/or Cohere embeddings) so that we could discover potential regressions in accuracy earlier (see example regression being reported here) .related to #155 .