milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.87k stars 2.94k forks source link

[Bug]: Invalid trace stack for exception of type: milvus::SegcoreError #37944

Open lxl0928 opened 5 hours ago

lxl0928 commented 5 hours ago

Is there an existing issue for this?

Environment

- Milvus version: 2.4.15-for-gpu
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2): restful api
- OS(Ubuntu or CentOS): 220-Ubuntu SMP Fri Sep 27 13:19:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- CPU/Memory: Intel(R) Xeon(R) Platinum 8350C CPU @ 2.60GHz 
- GPU: Nvidia A10 * 4
- Others:

Current Behavior

When I set the TTL (Time-To-Live) for a Milvus collection to ttl=2360024, and concurrently perform vector insertion and vector search tests, while loading 390,000 rows of vectors, it consistently triggers the error: W20241122 07:19:39.079182 2555 ExceptionTracer.cpp:187] Invalid trace stack for exception of type: milvus::SegcoreError

Expected Behavior

No Error。

Steps To Reproduce

0. running on GPUs
1. Set Collection's TTL
2. insert vector and search vector at the same time, insert qps: 112req/s, search qps: 108req/s
3. running 39w+ vectors inserted, the error `Invalid trace stack for exception of type: milvus::SegcoreError` excepted.

Milvus Log

milvus-standalone-for-gpu-11221521.log

[2024/11/22 07:19:39.055 +00:00] [DEBUG] [segments/validate.go:50] ["read target partitions"] [traceID=60aa8711d5123a4f8fa0f9a896a7547e] [collectionID=454101003657545071] [partitionIDs="[454101003657545081,454101003657545072,454101003657545075,454101003657545073,454101003657545079,454101003657545076,454101003657545074,454101003657545082,454101003657545084,454101003657545085,454101003657545087,454101003657545078,454101003657545080,454101003657545077,454101003657545083,454101003657545086]"]
[2024/11/22 07:19:39.055 +00:00] [DEBUG] [segments/validate.go:50] ["read target partitions"] [traceID=60aa8711d5123a4f8fa0f9a896a7547e] [collectionID=454101003657545071] [partitionIDs="[454101003657545079,454101003657545076,454101003657545074,454101003657545078,454101003657545080,454101003657545082,454101003657545084,454101003657545085,454101003657545087,454101003657545077,454101003657545083,454101003657545086,454101003657545081,454101003657545072,454101003657545075,454101003657545073]"]
W20241122 07:19:39.055075  2525 ExceptionTracer.cpp:187] Invalid trace stack for exception of type: milvus::SegcoreError
[2024/11/22 07:19:39.055 +00:00] [DEBUG] [segments/segment.go:575] ["search segment..."] [traceID=60aa8711d5123a4f8fa0f9a896a7547e] [collectionID=454101003657545071] [segmentID=454101003659154520] [segmentType=Growing] [withIndex=false]
[2024/11/22 07:19:39.055 +00:00] [DEBUG] [segments/segment.go:575] ["search segment..."] [traceID=60aa8711d5123a4f8fa0f9a896a7547e] [collectionID=454101003657545071] [segmentID=454101003659154519] [segmentType=Growing] [withIndex=false]
[2024/11/22 07:19:39.055 +00:00] [DEBUG] [segments/segment.go:575] ["search segment..."] [traceID=60aa8711d5123a4f8fa0f9a896a7547e] [collectionID=454101003657545071] [segmentID=454101003659154109] [segmentType=Growing] [withIndex=false]
[2024/11/22 07:19:39.055 +00:00] [DEBUG] [segments/segment.go:575] ["search segment..."] [traceID=60aa8711d5123a4f8fa0f9a896a7547e] [collectionID=454101003657545071] [segmentID=454101003659154591] [segmentType=Growing] [withIndex=false]
[2024/11/22 07:19:39.055 +00:00] [DEBUG] [segments/segment.go:575] ["search segment..."] [traceID=60aa8711d5123a4f8fa0f9a896a7547e] [collectionID=454101003657545071] [segmentID=454101003659154586] [segmentType=Growing] [withIndex=false]
W20241122 07:19:39.064556  3193 ExceptionTracer.cpp:187] Invalid trace stack for exception of type: milvus::SegcoreError
W20241122 07:19:39.079182  2555 ExceptionTracer.cpp:187] Invalid trace stack for exception of type: milvus::SegcoreError

Anything else?

Milvus.yaml:.log -> .yaml

milvus-yaml.log

milvus-standalone docker-compose:

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.4.15-gpu
    command: ["milvus", "run", "standalone"]
    security_opt:
    - seccomp:unconfined
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
    ports:
      - "19530:19530"
      - "9091:9091"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: ["gpu"]
              device_ids: ["0"]
    depends_on:
      - "etcd"
      - "minio"

collection init.py:


# coding: utf-8
import time
import configparser
from tqdm import tqdm
from pymilvus import connections, utility, FieldSchema, CollectionSchema, DataType, Collection, MilvusClient

milvus_uri = "http://xx.xx.xx.xx:19530"
milvus_token = "root:Milvus"

# 创建 Collection
collection_name = "FaceV4"

fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True, description="primary id"),
    FieldSchema(name="sn", dtype=DataType.VARCHAR, max_length=32, description="设备sn"),
    FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=768, description="特征向量", mmap_enabled=True),
    FieldSchema(name="app_id", dtype=DataType.VARCHAR, max_length=36, is_partition_key=True, description="租户"),
    FieldSchema(name="timestamp", dtype=DataType.INT64, description="特征产生的时间戳")
]

schema = CollectionSchema(fields, description="Demo collection for PyMilvus", ttl=24 * 3600 * 2)

client = MilvusClient(uri=milvus_uri, token=milvus_token, db_name="default")

index_params = client.prepare_index_params()

index_params.add_index(
    field_name="id",
    index_type="STL_SORT",  # 标准排序算法
    idnex_name="inx_id"
)
index_params.add_index(
    field_name="sn",
    index_type="INVERTED",  # 倒排索引
    idnex_name="inx_sn"
)
index_params.add_index(
    field_name="app_id",
    index_type="INVERTED",  # 倒排索引
    idnex_name="inx_app_id"
)
index_params.add_index(
    field_name="timestamp",
    index_type="STL_SORT",  # 标准排序算法
    idnex_name="inx_timestamp",
)

index_params.add_index(
    field_name="vector",
    index_type="GPU_IVF_FLAT",  # IVF_FLAT 不进行任何压缩,因此它生成的索引文件大小与原始的非索引向量数据大致相同
    metric_type="L2",
    params={"nlist": 256}
)

client.create_collection(
    collection_name=collection_name,
    schema=schema,
    index_params=index_params,
)
print(f"Collection '{collection_name}' created.")

time.sleep(5)

res = client.get_load_state(
    collection_name=collection_name
)

print(res)
yanliang567 commented 5 hours ago

/assign @Presburger /unassign

yanliang567 commented 4 hours ago

/assign @smellthemoon please also help to fix the logs with data info /unassign