zilliztech / VectorDBBench

A Benchmark Tool for VectorDB
MIT License
453 stars 107 forks source link

Failing to run the benchmark with pgvecto.rs #265

Closed rodrigonascimento closed 2 months ago

rodrigonascimento commented 5 months ago

Hi,

I'm trying to run the benchmark using the pgvecto.rs option, and it's failing to create the embedding vector table saying the "operator class cosine_ops does not exist for access method vectors". Here is the execution log:

2024-01-19 15:11:37,641 | INFO: generated uuid for the tasks: cf61f1c2a3fb4593afc18b6d1153c2be (interface.py:69) (2077578)
2024-01-19 15:11:37,654 | INFO | DB             | CaseType     Dataset               Filter | task_label (task_runner.py:288)
2024-01-19 15:11:37,654 | INFO | -----------    | ------------ -------------------- ------- | -------    (task_runner.py:288)
2024-01-19 15:11:37,654 | INFO | PgVectoRS-cohere10M | Performance  Cohere-LARGE-10M        None | 2024011915-pg (task_runner.py:288)
2024-01-19 15:11:37,654 | INFO: task submitted: id=cf61f1c2a3fb4593afc18b6d1153c2be, 2024011915-pg, case number: 1 (interface.py:235) (2077578)
2024-01-19 15:11:38,771 | INFO: [1/1] start case: {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'Cohere', 'size': 10000000, 'dim': 768, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'PgVectoRS-cohere10M'}, drop_old=True (interface.py:167) (2078634)
2024-01-19 15:11:38,859 | INFO: Pgvecto.rs client drop table : PgVectorCollection (pgvecto_rs.py:47) (2078634)
2024-01-19 15:11:38,876 | WARNING: Failed to create pgvector table: PgVectorCollection error: operator class "cosine_ops" does not exist for access method "vectors"
 (pgvecto_rs.py:115) (2078634)
2024-01-19 15:11:38,876 | WARNING: pre run case error: operator class "cosine_ops" does not exist for access method "vectors"
 (task_runner.py:92) (2078634)
2024-01-19 15:11:38,876 | WARNING: [1/1] case {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'Cohere', 'size': 10000000, 'dim': 768, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'PgVectoRS-cohere10M'} failed to run, reason=operator class "cosine_ops" does not exist for access method "vectors"
 (interface.py:187) (2078634)
Traceback (most recent call last):
  File "/root/.venv-vbench/lib64/python3.11/site-packages/vectordb_bench/interface.py", line 168, in _async_task_v2
    case_res.metrics = runner.run(drop_old)
                       ^^^^^^^^^^^^^^^^^^^^
  File "/root/.venv-vbench/lib64/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 96, in run
    self._pre_run(drop_old)
  File "/root/.venv-vbench/lib64/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 93, in _pre_run
    raise e from None
  File "/root/.venv-vbench/lib64/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 86, in _pre_run
    self.init_db(drop_old)
  File "/root/.venv-vbench/lib64/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 77, in init_db
    self.db = db_cls(
              ^^^^^^^
  File "/root/.venv-vbench/lib64/python3.11/site-packages/vectordb_bench/backend/clients/pgvecto_rs/pgvecto_rs.py", line 51, in __init__
    self._create_index()
  File "/root/.venv-vbench/lib64/python3.11/site-packages/vectordb_bench/backend/clients/pgvecto_rs/pgvecto_rs.py", line 118, in _create_index
    raise e from None
  File "/root/.venv-vbench/lib64/python3.11/site-packages/vectordb_bench/backend/clients/pgvecto_rs/pgvecto_rs.py", line 109, in _create_index
    self.cursor.execute(
psycopg2.errors.UndefinedObject: operator class "cosine_ops" does not exist for access method "vectors"

Any thoughts?

All the best,

--Rodrigo

alwayslove2013 commented 5 months ago

@rodrigonascimento It seems that different versions of pgvectorRS have different procedures for using the cosine metric. https://github.com/immich-app/immich/issues/5766 https://github.com/tensorchord/pgvecto.rs/issues/191

Cloud you show us the version of pgvectorRS you are testing?

cutecutecat commented 2 months ago

Hello, I am an developer of pgvecto.rs and the author of pgvecto.rs benchmark at VectorDBBench.

Sorry about that, we have changed many operator names at upgrade from v0.1.0 to v0.2.0, so the original benchmark is no longer compatible.

For example, the cosine_ops becomes vector_cos_ops.

As the operators of pgvecto.rs are much more stable now. I could help to fix it in these days.

You could assign it to me if convenient @alwayslove2013