opensearch-project / opensearch-benchmark-workloads

Official workloads used by OpenSearch Benchmark (OSB)
https://opensearch.org/docs/latest/benchmark/
11 stars 58 forks source link

[FEATURE] Add Train Model KNN Workload #332

Open finnroblin opened 2 weeks ago

finnroblin commented 2 weeks ago

Is your feature request related to a problem?

Customers may want to benchmark approximate k-NN search algorithms that require a training step. For example, the k-NN plugin with the FAISS engine and IVF method requires a training step to cluster database vectors. Then search can be performed against a smaller number of cluster centroids instead of the entire database.

There is no preexisting workload that supports this use case or an OSB operation-type to call the k-NN training API.

What solution would you like?

Add a workload that benchmarks both training a model (like faiss ivf) and searching it. This workload would require code additions in the OpenSearch Benchmarks repo in order to support the initial training operation.

Do you have any additional context?

There is a benchmarking procedure in the k-NN plugins repo for training. However it is a better customer experience to have an automated workload in the opensearch-benchmark-workloads repository. There is already a workload for the approximate k-NN methods that do not require training like HNSW.

Subtasks:

gkamat commented 2 weeks ago

@VijayanB perhaps you can comment on this? Thanks.

VijayanB commented 1 week ago

@gkamat Currently this feature is not added to OSB. We are still using OSB from K-NN to execute this operation. As a part of this task , we can deprecate this operation from K-NN and 1 step closer to using this repo as one repo for all vector search benchmarks.