milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.05k stars 2.88k forks source link

[Bug]: query failed: Assert "index < this->counter_" => index out of range, index=0, counter_=0 #36871

Open ThreadDao opened 6 days ago

ThreadDao commented 6 days ago

Is there an existing issue for this?

Environment

- Milvus version: 2.4-20241013-44564f04-amd64
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

milvus server

deploy a cluster with image 2.4-20241013-44564f04-amd64

results

Only one search requests failed: with query failed: Assert "index < this->counter_" => index out of range, index=0, counter_=0 at /workspace/source/internal/core/src/mmap/ChunkVector.h:12

[2024-10-14 12:57:08,548 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=65535, message=fail to search on QueryNode 2: worker(2) query failed: Assert "index < this->counter_"  => index out of range, index=0, counter_=0 at /workspace/source/internal/core/src/mmap/ChunkVector.h:126
)>, <Time:{'RPC start': '2024-10-14 12:57:07.696279', 'RPC error': '2024-10-14 12:57:08.547992'}> (decorators.py:147)
[2024-10-14 12:57:08,549 - ERROR - fouram]: (api_response) : [Collection.search] <MilvusException: (code=65535, message=fail to search on QueryNode 2: worker(2) query failed: Assert "index < this->counter_"  => index out of range, index=0, counter_=0 at /workspace/source/internal/core/src/mmap/ChunkVector.h:126
)>, [requestId: d19f9322-8a2b-11ef-9efb-caae83df8f3a] (api_request.py:57)
[2024-10-14 12:57:08,549 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=65535, message=fail to search on QueryNode 2: worker(2) query failed: Assert "index < this->counter_"  => index out of range, index=0, counter_=0 at /workspace/source/internal/core/src/mmap/ChunkVector.h:126
)> (func_check.py:106)

Expected Behavior

No response

Steps To Reproduce

- [argo workflow](https://argo-workflows.zilliz.cc/archived-workflows/qa/be3623e6-1769-4fdb-8585-3411613ca407?nodeId=zong-rolling-upgrade-all-hjwcd-205957643)

Milvus Log

pods:

zong-rolling-upgrade-all-hjwcd-milvus-datanode-6864db75b5-6vkdc     Running     0            1m      10.104.34.9       4am-node37     
zong-rolling-upgrade-all-hjwcd-milvus-datanode-6864db75b5-wp9sp     Running     0            1m      10.104.25.168     4am-node30     
zong-rolling-upgrade-all-hjwcd-milvus-indexnode-75d6c7f47-p5kjx     Running     0            1m      10.104.21.250     4am-node24     
zong-rolling-upgrade-all-hjwcd-milvus-indexnode-75d6c7f47-ts6ch     Running     0            1m      10.104.6.34       4am-node13     
zong-rolling-upgrade-all-hjwcd-milvus-mixcoord-579dcdd8fc-lrnn5     Running     0            1m      10.104.25.167     4am-node30     
zong-rolling-upgrade-all-hjwcd-milvus-proxy-b799c74d5-gbkgt         Running     0            1m      10.104.5.49       4am-node12     
zong-rolling-upgrade-all-hjwcd-milvus-querynode-0-55f689648nkv9     Running     0            1m      10.104.34.10      4am-node37     
zong-rolling-upgrade-all-hjwcd-milvus-querynode-0-55f68964ptv29     Running     0            1m      10.104.30.29      4am-node38     
zong-rolling-upgrade-all-hjwcd-milvus-querynode-0-55f68964q7b9v     Running     0            1m      10.104.5.50       4am-node12

Anything else?

No response

yanliang567 commented 6 days ago

/assign @weiliu1031 /unassign

ThreadDao commented 6 days ago

/assign @cqy123456 /unassign @weiliu1031

xiaofan-luan commented 6 days ago

/assign @sunby

xiaofan-luan commented 6 days ago

/assign @cqy123456

xiaofan-luan commented 6 days ago

this seems to be an growing mmap issue

cqy123456 commented 5 days ago

Growing insert is not locked. During the insert process, the vector chunks will be cleared (chunk number counter = 0)after the growing index is built. And the growing segment uses indexingrecord.SyncDataWithIndex to get whether the growing index has been successfully built. img_v3_02fn_9d7b2423-21fd-4ab9-af93-1043b3e8d6bg From the log, it can be seen that the chunks are cleared and the vector BF search at the same time, and there is a consistency problem in the access of indexingrecord.SyncDataWithIndex. SyncDataWithIndex = fasle ->jump to the BF search logic->SyncDataWithIndex = true -> try_remove_chunks -> BF search.