milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.07k stars 2.88k forks source link

[Bug]: Query failed with error `query failed: UnknownError: => failed to get vector, faiss inner error: attempt #1: no available shard delegator found: service unavailable` #28152

Closed zhuwenxing closed 10 months ago

zhuwenxing commented 11 months ago

Is there an existing issue for this?

Environment

- Milvus version:master-20231102-9b737b77-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): pulsar   
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2023-11-02T09:36:33.386Z] 2023-11-02 09:36:33.006 | INFO     | MainThread |utils:load_and_search:214 - collection name: task_1_IVF_FLAT

[2023-11-02T09:36:33.386Z] 2023-11-02 09:36:33.006 | INFO     | MainThread |utils:load_and_search:215 - load collection

[2023-11-02T09:36:38.625Z] 2023-11-02 09:36:38.548 | INFO     | MainThread |utils:load_and_search:219 - load time: 5.5414

[2023-11-02T09:36:38.626Z] 2023-11-02 09:36:38.559 | INFO     | MainThread |utils:load_and_search:233 - {'metric_type': 'L2', 'params': {'nprobe': 10}}

[2023-11-02T09:36:38.626Z] 2023-11-02 09:36:38.559 | INFO     | MainThread |utils:load_and_search:236 - 

[2023-11-02T09:36:38.626Z] Search...

[2023-11-02T09:36:38.626Z] 2023-11-02 09:36:38.565 | INFO     | MainThread |utils:load_and_search:247 - hit: id: 2557, distance: 31.548093795776367, entity: {'count': 2557, 'random_value': -17.0}

[2023-11-02T09:36:38.626Z] 2023-11-02 09:36:38.565 | INFO     | MainThread |utils:load_and_search:247 - hit: id: 825, distance: 31.873104095458984, entity: {'count': 825, 'random_value': -20.0}

[2023-11-02T09:36:38.626Z] 2023-11-02 09:36:38.565 | INFO     | MainThread |utils:load_and_search:247 - hit: id: 2493, distance: 32.463722229003906, entity: {'count': 2493, 'random_value': -18.0}

[2023-11-02T09:36:38.626Z] 2023-11-02 09:36:38.565 | INFO     | MainThread |utils:load_and_search:247 - hit: id: 1826, distance: 32.58391571044922, entity: {'count': 1826, 'random_value': -17.0}

[2023-11-02T09:36:38.626Z] 2023-11-02 09:36:38.565 | INFO     | MainThread |utils:load_and_search:247 - hit: id: 1605, distance: 32.7285041809082, entity: {'count': 1605, 'random_value': -15.0}

[2023-11-02T09:36:38.626Z] 2023-11-02 09:36:38.565 | INFO     | MainThread |utils:load_and_search:250 - [2557, 825, 2493, 1826, 1605]

[2023-11-02T09:36:38.626Z] 2023-11-02 09:36:38.565 | INFO     | MainThread |utils:load_and_search:252 - search latency: 0.0061s

[2023-11-02T09:36:39.188Z] RPC error: [query], <MilvusException: (code=65538, message=failed to query: attempt #0: failed to search/query delegator 5 for channel by-dev-rootcoord-dml_2_445361027637928527v0: fail to Query, QueryNode ID = 5, reason=worker(6) query failed: UnknownError:  => failed to get vector, faiss inner error: attempt #1: no available shard delegator found: service unavailable)>, <Time:{'RPC start': '2023-11-02 09:36:38.565966', 'RPC error': '2023-11-02 09:36:39.174117'}>

[2023-11-02T09:36:39.188Z] Traceback (most recent call last):

[2023-11-02T09:36:39.188Z]   File "scripts/action_before_reinstall.py", line 47, in <module>

[2023-11-02T09:36:39.188Z]     task_1(data_size, host)

[2023-11-02T09:36:39.188Z]   File "scripts/action_before_reinstall.py", line 16, in task_1

[2023-11-02T09:36:39.188Z]     load_and_search(prefix)

[2023-11-02T09:36:39.188Z]   File "/home/jenkins/agent/workspace/tests/python_client/deploy/scripts/utils.py", line 259, in load_and_search

[2023-11-02T09:36:39.188Z]     res = c.query(expr, output_fields, timeout=120)

[2023-11-02T09:36:39.188Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 916, in query

[2023-11-02T09:36:39.188Z]     return conn.query(

[2023-11-02T09:36:39.188Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 129, in handler

[2023-11-02T09:36:39.188Z]     raise e from e

[2023-11-02T09:36:39.188Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 125, in handler

[2023-11-02T09:36:39.188Z]     return func(*args, **kwargs)

[2023-11-02T09:36:39.188Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 164, in handler

[2023-11-02T09:36:39.188Z]     return func(self, *args, **kwargs)

[2023-11-02T09:36:39.188Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 104, in handler

[2023-11-02T09:36:39.188Z]     raise e from e

[2023-11-02T09:36:39.188Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 68, in handler

[2023-11-02T09:36:39.188Z]     return func(*args, **kwargs)

[2023-11-02T09:36:39.188Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 1382, in query

[2023-11-02T09:36:39.188Z]     raise MilvusException(

[2023-11-02T09:36:39.188Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=65538, message=failed to query: attempt #0: failed to search/query delegator 5 for channel by-dev-rootcoord-dml_2_445361027637928527v0: fail to Query, QueryNode ID = 5, reason=worker(6) query failed: UnknownError:  => failed to get vector, faiss inner error: attempt #1: no available shard delegator found: service unavailable)>

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_cron/detail/deploy_test_cron/1547/pipeline

log: artifacts-pulsar-cluster-reinstall-1547-pytest-logs.tar.gz artifacts-pulsar-cluster-reinstall-1547-server-logs.tar.gz

Anything else?

No response

zhuwenxing commented 11 months ago

It works well in master-20231031-0677d262-amd64 but it starts to fail in master-20231101-a1033604-amd64

yanliang567 commented 11 months ago

/assign @congqixia /unassign

congqixia commented 11 months ago

this problem is introduced by temporary index from PR #27673 working with @cqy123456 /assign @cqy123456

congqixia commented 11 months ago

/assign @zhuwenxing fix merged for master branch, could you please verify?

stale[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.