opensearch-project / opensearch-benchmark

OpenSearch Benchmark - a community driven, open source project to run performance tests for OpenSearch
https://opensearch.org/docs/latest/benchmark/
Apache License 2.0
98 stars 71 forks source link

Support Big-ANN Ground truth data as param in vector search #530

Open VijayanB opened 2 months ago

VijayanB commented 2 months ago

Is your feature request related to a problem? Please describe.

BIGANN is one of the popular vector search dataset to measure the performance. However, they follow different format for Base data, Query data and Ground truth. Currently, Vector Search Neighbor param doesn't support Ground truth format. This is different from base data, hence, BigANNVectorDataSet should extend support to read, parse, convert it into neighbors like hdf5 for "bin" extension.

Describe the solution you'd like

Extend BigANN Dataset to support new extension "bin" that can parse ground truth and can be used as input for neighbors data set .

Describe alternatives you've considered

Manually convert it into hdf5 or previously supported format like fbin/u8bin

Additional context

N/A

VijayanB commented 2 months ago

This was previously supported only in perf-tool . It was not added in osb since recall was not supported. With recall support, adding this format, will help users to gradually move out of perf-tool and use OSB for all use cases.

VijayanB commented 2 months ago

Can we add 1.6 release tag to this issue? Thank you.