opensearch-project / opensearch-benchmark-workloads

Official workloads used by OpenSearch Benchmark (OSB)
https://opensearch.org/docs/latest/benchmark/
11 stars 58 forks source link

Add vectorsearch training workload #333

Open finnroblin opened 1 week ago

finnroblin commented 1 week ago

Description

Adds the train-test vectorsearch workload to benchmark kNN operations that require training like faiss ivf. Please see issue #332 for context.

This PR adds a schedule to train kNN algorithms using the train-knn-model operation proposal in OSB PR 556. It depends on the operation runners in that PR. It also requires an additional index in the vectorsearch workload.json to hold training data.

The train-test workload on my branch works on the faiss-sift-128 dataset without breaking backwards compatibility with other vectorsearch workloads. Please feel free to clone my forks (OSB, OSB Workload) to investigate workload behavior, as there are not unit tests in the OSB workloads framework.

Issues Resolved

Closes #332

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.