milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.91k stars 2.95k forks source link

[Bug]: failed to search if topk is larger than 100 #25079

Closed yanliang567 closed 1 year ago

yanliang567 commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version: master-20230619-a6310050
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): 
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

search successfully when topk=1 or 10, but fail when topk=80 or 100 or more image

Expected Behavior

search successfully with topk=16384

Steps To Reproduce

1. create a collection and insert 20m-128d vectors.
2. build index with cosine
3. search with different topk

Milvus Log

error msg:

06/21/2023 09:26:51 AM - ERROR - RPC error: [search], <MilvusException: (code=1, message=attempt #0: fail to Search, QueryNode ID=11, reason=[UnexpectedError] Assert "failed to search, err: out of range in json" at /go/src/github.com/milvus-io/mil
vus/internal/core/src/index/VectorMemIndex.cpp:130: channel=yanliang-cd-hnsw-rootcoord-dml_0_442298406277232411v0: fail to access shard delegator: fail to search on all shard leaders)>, <Time:{'RPC start': '2023-06-21 09:26:50.850154', 'RPC error'
: '2023-06-21 09:26:51.081243'}>

pod names:

yanliang-cd-hnsw-milvus-datanode-68cb744dc5-mhp6v                 1/1     Running       0                29h     10.102.7.196    devops-node11   <none>           <none>
yanliang-cd-hnsw-milvus-indexnode-6c9cd7dd64-6csln                1/1     Running       0                26h     10.102.7.167    devops-node11   <none>           <none>
yanliang-cd-hnsw-milvus-indexnode-6c9cd7dd64-lpfh2                1/1     Running       1 (26h ago)      26h     10.102.9.241    devops-node13   <none>           <none>
yanliang-cd-hnsw-milvus-mixcoord-94f9c776f-tkb4g                  1/1     Running       2 (28h ago)      29h     10.102.9.159    devops-node13   <none>           <none>
yanliang-cd-hnsw-milvus-proxy-7ccb44d5f-42tlj                     1/1     Running       0                29h     10.102.5.150    devops-node21   <none>           <none>
yanliang-cd-hnsw-milvus-proxy-7ccb44d5f-qzh7r                     1/1     Running       0                29h     10.102.7.198    devops-node11   <none>           <none>
yanliang-cd-hnsw-milvus-querynode-58c584c654-4v2vz                1/1     Running       0                29h     10.102.6.210    devops-node10   <none>           <none>
yanliang-cd-hnsw-milvus-querynode-58c584c654-cvxn8                1/1     Running       1 (28h ago)      29h     10.102.9.158    devops-node13   <none>           <none>
yanliang-cd-hnsw-milvus-querynode-58c584c654-gs4rm                1/1     Running       0                29h     10.102.10.152   devops-node20   <none>           <none>
yanliang-cd-hnsw-milvus-querynode-58c584c654-p4zls                1/1     Running       0                29h     10.102.7.199    devops-node11   <none>           <none>

Anything else?

  1. my collection schema: {'auto_id': True, 'description': 'hnsw_test', 'fields': [{'name': 'id', 'description': 'auto primary id', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': True}, {'name': 'age', 'description': 'age', 'type': <DataType.INT64: 5>}, {'name': 'flag', 'description': 'flag', 'type': <DataType.BOOL: 1>}, {'name': 'embedding', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}]}
  2. my index params {'index_type': 'HNSW', 'metric_type': 'COSINE', 'params': {'M': 30, 'efConstruction': 360}}
  3. search params: {'metric_type': 'COSINE', 'params': {'ef': 64}}
yanliang567 commented 1 year ago

/assign @jiaoew1991 /unassign

jiaoew1991 commented 1 year ago

/assign @yah01 /unassign

yah01 commented 1 year ago

/assign @yanliang567 the search param ef must be not less than topk

xiaofan-luan commented 1 year ago

failed to search, err: out of range in json" The error message need to be improved

yanliang567 commented 1 year ago

@yah01 please help to update a meaningful error msg.

yah01 commented 1 year ago

@yah01 please help to update a meaningful error msg.

Sure, working on it

yah01 commented 1 year ago

@liliu-z Knowhere returns the error as Status, which is an enum class, impossible to attach message to this type. should we change Status to a struct with fields code and message?

liliu-z commented 1 year ago

@liliu-z Knowhere returns the error as Status, which is an enum class, impossible to attach message to this type. should we change Status to a struct with fields code and message?

Sure lets do this

liliu-z commented 1 year ago

Knowhere's expected supports what() api right now for info passing.

yah01 commented 1 year ago

/assign @yanliang567 fixed with https://github.com/milvus-io/knowhere/pull/970 and #25473

yanliang567 commented 1 year ago

will verify the fix soon /unassign @yah01

yanliang567 commented 1 year ago

/assign @NicoYuan1986 could you please help to verify the issue and update the testcase accordingly.

/unassign

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

NicoYuan1986 commented 1 year ago

Forget to comment. The issue has been verified fixed.