milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.36k stars 2.91k forks source link

[Bug]: failed to search on all shard leaders, err=fail to Search, QueryNode ID=89 #18776

Closed matt1209 closed 2 years ago

matt1209 commented 2 years ago

Is there an existing issue for this?

Environment

- Milvus version:2.1.1
- Deployment mode(standalone or cluster):standalone 
- SDK version(e.g. pymilvus v2.0.0rc2):pymilvus2.1.1
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

When I run collection.search , the error raised: fail to search on all shard leaders, fail to Search, QueryNode ID=89.

Expected Behavior

No response

Steps To Reproduce

1. run 5 times `collection_prepare.py` in benchmark blog on milvus.io, I changed the `nb=50000, insert_times=200`
2. run script on <https://milvus.io/docs/v2.1.x/search.md> 
`search_params = {"metric_type": "L2", "params": {"nprobe": 10}}`
`results = collection.search(
    data=[[random.random() for _ in range(dim)]], 
    anns_field="embedding", 
    param=search_params, 
    limit=10, 
    expr=None,
    consistency_level="Strong"
)`

Milvus Log

ed020480-7e99-48e1-a568-1a16d1a4f361

Anything else?

No response

yanliang567 commented 2 years ago

@matt1209 thank you for the issue. Did you also change the index type in collectionprepare.py? It seems that you are using ivf index in search params, while it build hnsw index in collection_prepare.py? Could you please refer this script to export the whole Milvus logs for investigation?

/assign @matt1209 /unassign

matt1209 commented 2 years ago

Hi, I thought I use HNSW here. I changed the search_param to search_params = {"metric_type": "L2", "params": {"ef": 10}} but here is same error. I put the log here 14f60c7b-adab-48e0-85cb-7ced4bb2fa1b e

JackLCL commented 2 years ago

image

What's your topk? The ef parameter must in [top_k, 32768].

yanliang567 commented 2 years ago

I see, topk does not matter here. Milvus will use topk as ef if ef<topk. please refer this script to export the whole Milvus logs for investigation. we shall have all completed logs to analzye the issue.

matt1209 commented 2 years ago

I'm sorry I can't provide here because we don't have internet for all computers.

matt1209 commented 2 years ago

Is this caused by I run the benchmark script 5 times?

yanliang567 commented 2 years ago

it's hard to tell if no completed logs. does it only reproduce when you run the scripts for 5 times? what about for 1 time? How did you deploy milvus,with helm, operator or docker compose? did you limit the cpu or memory for milvus?

matt1209 commented 2 years ago

Hi, I just tried for run 1 time but when I did search, there was an error msg : check if Loaded failded when search, partitions:[], err=showPartitions failed, reason=collection xxxx has not been loaded into QueryNode. Does it must be loaded before search? I tried to run load() but it takes long time and large memery to load.

yanliang567 commented 2 years ago

Hi, I just tried for run 1 time but when I did search, there was an error msg : check if Loaded failded when search, partitions:[], err=showPartitions failed, reason=collection xxxx has not been loaded into QueryNode. Does it must be loaded before search? I tried to run load() but it takes long time and large memery to load.

this error indicates that collection load failed. I guess you don't have enough free memory for load the collection.

matt1209 commented 2 years ago

Hi, I just tried for run 1 time but when I did search, there was an error msg : check if Loaded failded when search, partitions:[], err=showPartitions failed, reason=collection xxxx has not been loaded into QueryNode. Does it must be loaded before search? I tried to run load() but it takes long time and large memery to load.

this error indicates that collection load failed. I guess you don't have enough free memory for load the collection.

Is there anyway that I can do search on hard disk instead loading to memory? I deploy milvus with docker compose but cpu mem is not enough.

yanliang567 commented 2 years ago

i am afraid not. But a good news is that the community is going to offer a new disk-based index type, which would save about 5-6 times of memory. If everything goes well, it will be available soon this year.