Open jmazanec15 opened 2 years ago
This is going to require flexibility in how the results metrics are defined, computed, processed and reported. It will take some consideration.
Right, I guess there are a few other applications I can think of that may require similar functionality: Anomaly Detection, Learning to Rank. For these, recall/accuracy are KPIs.
+1 on this. extendable metrics would help Anomaly Detection as well, we are starting to define how we benchmark AD in various ways such as our own execution time to get an anomaly result, recall/precision, and other KPIs on our own specific workloads and as a detector is running. I also want to add that this will greatly benefit ML-Commons as well
@IanHoang as discussed offline, taking a look at this issue
Added a new issue https://github.com/opensearch-project/opensearch-benchmark/issues/435 which would allow the user to specify percentiles they want to see, which would be a subset of this issue
Is your feature request related to a problem? Please describe.
For the k-NN plugin, I am working on adding a custom runner that will execute queries from a numeric data set and calculate the recall. k-NN plugin has an assortment of Approximate Nearest Neighbor algorithms. Generally, users will need to make tradeoffs between the approximateness of their system and the latency/throughput - so they need the ability to see both of these metrics when benchmarking.
In the custom query runner, I return the recall alongside the latency, but this only gets stored as request meta data - not as an outputted result.
Describe the solution you'd like
I would like the ability to specify that "recall" should be output as a metric in the results and define the aggregation as taking the mean average.
In a more general sense, I would like the ability to be able to define custom metrics for runners and define their aggregations and get them to show up in the results.
Describe alternatives you've considered
Additional context