zilliztech / VectorDBBench

A Benchmark Tool for VectorDB
MIT License
458 stars 108 forks source link

pre run case error: The specified key does not exist for Search Performance Test (100M Dataset, 768 Dim) on Milvus Standalone #267

Closed anrahman4 closed 4 months ago

anrahman4 commented 5 months ago

Hello. I was trying to run a test run for Milvus Standalone with the Search Performance Test (100M Dataset, 768 Dim). However, I got the following output with an error of specified key does not exist:

2024-01-29 19:15:09,634 | INFO |Task summary: run_id=92ba5, task_label=2024012919_dockertes (models.py:285)
2024-01-29 19:15:09,634 | INFO |DB     | db_label case              label                | load_dur    qps          latency(p99)    recall        max_load_count | label (models.py:285)
2024-01-29 19:15:09,634 | INFO |------ | -------- ----------------- -------------------- | ----------- ------------ --------------- ------------- -------------- | ----- (models.py:285)
2024-01-29 19:15:09,634 | INFO |Milvus |          Performance768D1M 2024012919_dockertes | 562.5584    2261.2764    0.0044          0.9821        0              | :)    (models.py:285)
2024-01-29 19:15:09,634 | INFO: write results to disk /home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/results/Milvus/result_20240129_2024012919_dockertes_milvus.json (models.py:143) (3442542)
2024-01-29 19:15:09,635 | INFO: Succes to finish task: label=2024012919_dockertes, run_id=92ba58d5fda64c8f88100a54d0f18c0a (interface.py:207) (3442542)
2024-01-29 19:19:01,624 | INFO: generated uuid for the tasks: 81e208a5a31844e8be6f45b9ff165e66 (interface.py:69) (3442368)
2024-01-29 19:19:01,624 | INFO | DB             | CaseType     Dataset               Filter | task_label (task_runner.py:288)
2024-01-29 19:19:01,624 | INFO | -----------    | ------------ -------------------- ------- | -------    (task_runner.py:288)
2024-01-29 19:19:01,624 | INFO | Milvus         | Performance  LAION-LARGE-100M        None | 2024012919_100Mdockertest (task_runner.py:288)
2024-01-29 19:19:01,624 | INFO: task submitted: id=81e208a5a31844e8be6f45b9ff165e66, 2024012919_100Mdockertest, case number: 1 (interface.py:235) (3442368)
2024-01-29 19:19:02,223 | INFO: [1/1] start case: {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'LAION', 'size': 100000000, 'dim': 768, 'metric_type': <MetricType.L2: 'L2'>}}, 'db': 'Milvus'}, drop_old=True (interface.py:167) (3463933)
2024-01-29 19:19:02,490 | INFO: Milvus client drop_old collection: VectorDBBenchCollection (milvus.py:45) (3463933)
2024-01-29 19:19:02,494 | INFO: Milvus create collection: VectorDBBenchCollection (milvus.py:55) (3463933)
2024-01-29 19:19:03,162 | INFO: local dataset root path not exist, creating it: /mnt/milvus_data/laion/laion_large_100m (data_source.py:126) (3463933)
2024-01-29 19:19:03,162 | INFO: Start to downloading files, total count: 104 (data_source.py:142) (3463933)
  2%|███▊                                                                                                                                                                                                    | 2/104 [00:00<00:46,  2.19it/s]
2024-01-29 19:19:04,074 | WARNING: pre run case error: The specified key does not exist. (task_runner.py:92) (3463933)
2024-01-29 19:19:04,074 | WARNING: [1/1] case {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'LAION', 'size': 100000000, 'dim': 768, 'metric_type': <MetricType.L2: 'L2'>}}, 'db': 'Milvus'} failed to run, reason=The specified key does not exist. (interface.py:187) (3463933)
Traceback (most recent call last):
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/interface.py", line 168, in _async_task_v2
    case_res.metrics = runner.run(drop_old)
                       ^^^^^^^^^^^^^^^^^^^^
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 96, in run
    self._pre_run(drop_old)
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 93, in _pre_run
    raise e from None
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 87, in _pre_run
    self.ca.dataset.prepare(self.dataset_source)
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/dataset.py", line 202, in prepare
    source.reader().read(
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/data_source.py", line 145, in read
    self.fs.download(s3_file, local_ds_root.as_posix())
  File "/home/labuser/.local/lib/python3.11/site-packages/fsspec/spec.py", line 1534, in download
    return self.get(rpath, lpath, recursive=recursive, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/labuser/.local/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/labuser/.local/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/home/labuser/.local/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/home/labuser/.local/lib/python3.11/site-packages/fsspec/asyn.py", line 650, in _get
    return await _run_coros_in_chunks(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/labuser/.local/lib/python3.11/site-packages/fsspec/asyn.py", line 254, in _run_coros_in_chunks
    await asyncio.gather(*chunk, return_exceptions=return_exceptions),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/tasks.py", line 452, in wait_for
    return await fut
           ^^^^^^^^^
  File "/home/labuser/.local/lib/python3.11/site-packages/s3fs/core.py", line 1224, in _get_file
    body, content_length = await _open_file(range=0)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/labuser/.local/lib/python3.11/site-packages/s3fs/core.py", line 1215, in _open_file
    resp = await self._call_s3(
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/labuser/.local/lib/python3.11/site-packages/s3fs/core.py", line 348, in _call_s3
    return await _error_wrapper(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/labuser/.local/lib/python3.11/site-packages/s3fs/core.py", line 140, in _error_wrapper
    raise err
FileNotFoundError: The specified key does not exist.

Currently starting up the Milvus Standalone through Docker Compose with the following docker-compose.yml:

version: '3.5'

services:
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - /mnt/milvus_data:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    volumes:
      - /mnt/milvus_data:/minio_data
    command: minio server /minio_data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.3.5
    command: ["milvus", "run", "standalone"]
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - /mnt/milvus_data:/var/lib/milvus
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"

networks:
  default:
    name: milvus
alwayslove2013 commented 5 months ago

@XuanYang-cn looks like some problem with the data download, please help ~

anrahman4 commented 4 months ago

Any fix to this yet?