qdrant / vector-db-benchmark

Framework for benchmarking vector search engines
https://qdrant.tech/benchmarks/
Apache License 2.0
270 stars 77 forks source link

Automate running benchmarks for all engines #134

Closed tellet-q closed 5 months ago

tellet-q commented 5 months ago

Solves https://github.com/qdrant/vector-db-benchmark/issues/123

Run *-default experiment for each engine using random-100 or glove-25-angular dataset against single-node deployment. Note that OS and ES have dedicated single-node deployments with reduced memory to fit into default github runner.

The workflow triggers:

tellet-q commented 5 months ago

We can adjust the triggering on a job level by adding conditionals like this:

      !(
        startsWith(github.event.head_commit.modified, 'tests/') || 
        startsWith(github.event.head_commit.modified, 'scripts/') || 
        startsWith(github.event.head_commit.modified, 'monitoring/') ||
        contains(github.event.head_commit.modified, '.dockerignore') ||
        contains(github.event.head_commit.modified, '.gitignore') ||
        contains(github.event.head_commit.modified, '.pre-commit-config.yaml') ||
        contains(github.event.head_commit.modified, 'Dockerfile') ||
        contains(github.event.head_commit.modified, 'LICENSE') ||
        contains(github.event.head_commit.modified, 'README.md')
        contains(github.event.head_commit.modified, 'run_all_engines.sh')
        contains(github.event.head_commit.modified, 'sync_results.sh')
      )

This will NOT trigger the jobs if the changes ONLY include changes in the specified folders and files. For any other case the jobs will run. Unfortunately I'll have to configure each job like this, so it'll look a bit cumbersome.

KShivendu commented 5 months ago

This will NOT trigger the jobs if the changes ONLY include changes in the specified folders and files. For any other case the jobs will run. Unfortunately I'll have to configure each job like this, so it'll look a bit cumbersome.

Interesting that we can do this.

@tellet-q Can we do something like this instead?

      (
        startsWith(github.event.head_commit.modified, 'engine/{clients,server}/*<engine-name>*') || 
        startsWith(github.event.head_commit.modified, 'engine/base_client/')
      )

Where <engine-name> will vary for each job (engine)

tellet-q commented 5 months ago

I hope this will also expose UI (in /actions) to manually pick only one particular engine/dataset, right?

Unfortunately, no. There are no changes in the UI.

tellet-q commented 5 months ago

@tellet-q Can we do something like this instead?

      (
        startsWith(github.event.head_commit.modified, 'engine/{clients,server}/*<engine-name>*') || 
        startsWith(github.event.head_commit.modified, 'engine/base_client/')
      )

Where <engine-name> will vary for each job (engine)

Not exactly like this, but similar, yes.

    if: >
      (
      startsWith(github.event.head_commit.modified, 'engine/clients/pgvector') ||
      startsWith(github.event.head_commit.modified, 'engine/servers/pgvector') ||
      startsWith(github.event.head_commit.modified, 'engine/base_client/')
      )

In this case the job will run ONLY if changes were made in specified folders.