[Bug]: All search failed with error `fail to get shard leaders from QueryCoord: collection xxx is not fully loaded` after reinstallation or upgrade #24287
- Milvus version:master-20230519-0b72cf2c
- Deployment mode(standalone or cluster):both
- MQ type(rocksmq, pulsar or kafka): pulsar and kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
[2023-05-21T10:01:15.875Z] + python3 scripts/action_after_reinstall.py --host 10.101.199.124 --data_size 3000
[2023-05-21T10:01:17.813Z] 2023-05-21 10:01:17.475 | INFO | MainThread |__main__:<module>:45 - data size: 3000
[2023-05-21T10:01:17.813Z] 2023-05-21 10:01:17.531 | INFO | MainThread |utils:get_collections:63 -
[2023-05-21T10:01:17.813Z] List collections...
[2023-05-21T10:01:17.813Z] 2023-05-21 10:01:17.656 | INFO | MainThread |utils:get_collections:65 - collections_nums: 5
[2023-05-21T10:01:18.068Z] 2023-05-21 10:01:18.037 | INFO | MainThread |utils:get_collections:74 - task_1_FLAT: 6000
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.060 | INFO | MainThread |utils:get_collections:74 - task_1_HNSW: 6000
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.073 | INFO | MainThread |utils:get_collections:74 - task_1_IVF_FLAT: 6000
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.080 | INFO | MainThread |utils:get_collections:74 - task_1_IVF_SQ8: 6000
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.088 | INFO | MainThread |utils:get_collections:74 - task_1_IVF_PQ: 6000
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.088 | INFO | MainThread |utils:load_and_search:197 - search data starts
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.088 | INFO | MainThread |utils:get_collections:63 -
[2023-05-21T10:01:18.322Z] List collections...
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.090 | INFO | MainThread |utils:get_collections:65 - collections_nums: 5
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.097 | INFO | MainThread |utils:get_collections:74 - task_1_FLAT: 6000
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.103 | INFO | MainThread |utils:get_collections:74 - task_1_HNSW: 6000
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.109 | INFO | MainThread |utils:get_collections:74 - task_1_IVF_FLAT: 6000
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.116 | INFO | MainThread |utils:get_collections:74 - task_1_IVF_SQ8: 6000
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.122 | INFO | MainThread |utils:get_collections:74 - task_1_IVF_PQ: 6000
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.125 | INFO | MainThread |utils:load_and_search:201 - collection name: task_1_FLAT
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.125 | INFO | MainThread |utils:load_and_search:202 - load collection
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.129 | INFO | MainThread |utils:load_and_search:206 - load time: 0.0035
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.161 | INFO | MainThread |utils:load_and_search:220 - {'metric_type': 'L2', 'params': {'nprobe': 10}}
[2023-05-21T10:01:18.322Z] 2023-05-21 10:01:18.161 | INFO | MainThread |utils:load_and_search:223 -
[2023-05-21T10:01:18.322Z] Search...
[2023-05-21T10:01:40.195Z] RPC error: [search], <MilvusException: (code=1, message=fail to search on all shard leaders, err=attempt #0: fail to get shard leaders from QueryCoord: collection 441623827145556478 is not fully loaded: attempt #1: fail to get shard leaders from QueryCoord: collection 441623827145556478 is not fully loaded: attempt #2: fail to get shard leaders from QueryCoord: collection 441623827145556478 is not fully loaded: attempt #3: fail to get shard leaders from QueryCoord: collection 441623827145556478 is not fully loaded: attempt #4: fail to get shard leaders from QueryCoord: collection 441623827145556478 is not fully loaded: attempt #5: fail to get shard leaders from QueryCoord: collection 441623827145556478 is not fully loaded: attempt #6: fail to get shard leaders from QueryCoord: collection 441623827145556478 is not fully loaded: context done during sleep after run#6: context deadline exceeded)>, <Time:{'RPC start': '2023-05-21 10:01:18.161461', 'RPC error': '2023-05-21 10:01:38.167097'}>
[2023-05-21T10:01:40.195Z] Traceback (most recent call last):
[2023-05-21T10:01:40.195Z] File "scripts/action_after_reinstall.py", line 46, in <module>
[2023-05-21T10:01:40.195Z] task_1(data_size, host)
[2023-05-21T10:01:40.195Z] File "scripts/action_after_reinstall.py", line 14, in task_1
[2023-05-21T10:01:40.195Z] load_and_search(prefix)
[2023-05-21T10:01:40.195Z] File "/home/jenkins/agent/workspace/tests/python_client/deploy/scripts/utils.py", line 226, in load_and_search
[2023-05-21T10:01:40.195Z] res = c.search(
[2023-05-21T10:01:40.195Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 666, in search
[2023-05-21T10:01:40.195Z] res = conn.search(self._name, data, anns_field, param, limit, expr,
[2023-05-21T10:01:40.195Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
[2023-05-21T10:01:40.195Z] raise e
[2023-05-21T10:01:40.195Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
[2023-05-21T10:01:40.196Z] return func(*args, **kwargs)
[2023-05-21T10:01:40.196Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
[2023-05-21T10:01:40.196Z] ret = func(self, *args, **kwargs)
[2023-05-21T10:01:40.196Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler
[2023-05-21T10:01:40.196Z] raise e
[2023-05-21T10:01:40.196Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
[2023-05-21T10:01:40.196Z] return func(self, *args, **kwargs)
[2023-05-21T10:01:40.196Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 521, in search
[2023-05-21T10:01:40.196Z] return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)
[2023-05-21T10:01:40.196Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 490, in _execute_search_requests
[2023-05-21T10:01:40.196Z] raise pre_err
[2023-05-21T10:01:40.196Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 481, in _execute_search_requests
[2023-05-21T10:01:40.196Z] raise MilvusException(response.status.error_code, response.status.reason)
[2023-05-21T10:01:40.196Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=attempt #0: fail to get shard leaders from QueryCoord: collection 441623827145556478 is not fully loaded: attempt #1: fail to get shard leaders from QueryCoord: collection 441623827145556478 is not fully loaded: attempt #2: fail to get shard leaders from QueryCoord: collection 441623827145556478 is not fully loaded: attempt #3: fail to get shard leaders from QueryCoord: collection 441623827145556478 is not fully loaded: attempt #4: fail to get shard leaders from QueryCoord: collection 441623827145556478 is not fully loaded: attempt #5: fail to get shard leaders from QueryCoord: collection 441623827145556478 is not fully loaded: attempt #6: fail to get shard leaders from QueryCoord: collection 441623827145556478 is not fully loaded: context done during sleep after run#6: context deadline exceeded)>
script returned exit code 1
— Shell Script
25s
[2023-05-21T10:01:16.164Z] + python3 scripts/second_recall_test.py --host 10.101.199.124
[2023-05-21T10:01:18.063Z] 2023-05-21 10:01:17.760 | INFO | __main__:search_test:53 - recall test for index type HNSW
[2023-05-21T10:01:18.318Z] 2023-05-21 10:01:18.251 | INFO | __main__:search_test:63 -
[2023-05-21T10:01:18.318Z] Search...
[2023-05-21T10:01:40.195Z] RPC error: [search], <MilvusException: (code=1, message=fail to search on all shard leaders, err=attempt #0: fail to get shard leaders from QueryCoord: collection 441623827145756500 is not fully loaded: attempt #1: fail to get shard leaders from QueryCoord: collection 441623827145756500 is not fully loaded: attempt #2: fail to get shard leaders from QueryCoord: collection 441623827145756500 is not fully loaded: attempt #3: fail to get shard leaders from QueryCoord: collection 441623827145756500 is not fully loaded: attempt #4: fail to get shard leaders from QueryCoord: collection 441623827145756500 is not fully loaded: attempt #5: fail to get shard leaders from QueryCoord: collection 441623827145756500 is not fully loaded: attempt #6: fail to get shard leaders from QueryCoord: collection 441623827145756500 is not fully loaded: context done during sleep after run#6: context deadline exceeded)>, <Time:{'RPC start': '2023-05-21 10:01:18.252191', 'RPC error': '2023-05-21 10:01:38.461851'}>
[2023-05-21T10:01:40.195Z] Traceback (most recent call last):
[2023-05-21T10:01:40.195Z] File "scripts/second_recall_test.py", line 103, in <module>
[2023-05-21T10:01:40.195Z] search_test(host, index_type)
[2023-05-21T10:01:40.195Z] File "scripts/second_recall_test.py", line 65, in search_test
[2023-05-21T10:01:40.195Z] res = collection.search(
[2023-05-21T10:01:40.195Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 666, in search
[2023-05-21T10:01:40.195Z] res = conn.search(self._name, data, anns_field, param, limit, expr,
[2023-05-21T10:01:40.195Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
[2023-05-21T10:01:40.195Z] raise e
[2023-05-21T10:01:40.195Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
[2023-05-21T10:01:40.195Z] return func(*args, **kwargs)
[2023-05-21T10:01:40.195Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
[2023-05-21T10:01:40.195Z] ret = func(self, *args, **kwargs)
[2023-05-21T10:01:40.195Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler
[2023-05-21T10:01:40.195Z] raise e
[2023-05-21T10:01:40.195Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
[2023-05-21T10:01:40.195Z] return func(self, *args, **kwargs)
[2023-05-21T10:01:40.195Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 521, in search
[2023-05-21T10:01:40.195Z] return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)
[2023-05-21T10:01:40.195Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 490, in _execute_search_requests
[2023-05-21T10:01:40.195Z] raise pre_err
[2023-05-21T10:01:40.195Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 481, in _execute_search_requests
[2023-05-21T10:01:40.195Z] raise MilvusException(response.status.error_code, response.status.reason)
[2023-05-21T10:01:40.195Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=attempt #0: fail to get shard leaders from QueryCoord: collection 441623827145756500 is not fully loaded: attempt #1: fail to get shard leaders from QueryCoord: collection 441623827145756500 is not fully loaded: attempt #2: fail to get shard leaders from QueryCoord: collection 441623827145756500 is not fully loaded: attempt #3: fail to get shard leaders from QueryCoord: collection 441623827145756500 is not fully loaded: attempt #4: fail to get shard leaders from QueryCoord: collection 441623827145756500 is not fully loaded: attempt #5: fail to get shard leaders from QueryCoord: collection 441623827145756500 is not fully loaded: attempt #6: fail to get shard leaders from QueryCoord: collection 441623827145756500 is not fully loaded: context done during sleep after run#6: context deadline exceeded)>
script returned exit code 1
Is there an existing issue for this?
Environment
Current Behavior
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_cron/detail/deploy_test_cron/810/pipeline
log:
artifacts-pulsar-cluster-reinstall-810-server-logs.tar.gz
artifacts-pulsar-cluster-reinstall-810-pytest-logs.tar.gz
Anything else?
some other failed jobs:
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_cron/detail/deploy_test_cron/812/pipeline log: artifacts-pulsar-cluster-upgrade-812-server-logs.tar.gz artifacts-pulsar-cluster-upgrade-812-pytest-logs.tar.gz