milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.06k stars 2.88k forks source link

[Bug]: Search failed with error `fail to search on all shard leaders, err=All attempts results:attempt #1:can not find client of node 4` without any distraction #22670

Closed zhuwenxing closed 1 year ago

zhuwenxing commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:2.2.0-20230309-130ab6da
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): kafka   
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

image

2023-03-09T12:43:40.935Z] [2023-03-09 12:35:18 - DEBUG - ci_test]: (api_request)  : [Collection.release] args: [], kwargs: {'timeout': 120} (api_request.py:56)

[2023-03-09T12:43:40.935Z] [2023-03-09 12:35:18 - DEBUG - ci_test]: (api_response) : None  (api_request.py:31)

[2023-03-09T12:43:40.935Z] [2023-03-09 12:35:18 - DEBUG - ci_test]: (api_request)  : [Collection.load] args: [None, 2, 120], kwargs: {} (api_request.py:56)

[2023-03-09T12:43:40.935Z] [2023-03-09 12:35:21 - DEBUG - ci_test]: (api_response) : None  (api_request.py:31)

[2023-03-09T12:43:40.935Z] [2023-03-09 12:35:21 - DEBUG - ci_test]: (api_request)  : [wait_for_loading_complete] args: ['deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000', None, 20, 'default'], kwargs: {} (api_request.py:56)

[2023-03-09T12:43:40.935Z] [2023-03-09 12:35:21 - DEBUG - ci_test]: (api_response) : None  (api_request.py:31)

[2023-03-09T12:43:40.935Z] [2023-03-09 12:35:21 - DEBUG - ci_test]: (api_request)  : [Collection.delete] args: ['int64 in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]', None, 120], kwargs: {} (api_request.py:56)

[2023-03-09T12:43:40.935Z] [2023-03-09 12:35:21 - DEBUG - ci_test]: (api_response) : (insert count: 0, delete count: 10, upsert count: 0, timestamp: 439973398787915778, success count: 0, err count: 0)  (api_request.py:31)

[2023-03-09T12:43:40.935Z] [2023-03-09 12:35:21 - DEBUG - ci_test]: (api_request)  : [Collection.search] args: [[[0.08990149242412115, 0.03313441502213216, 0.1502871799192971, 0.033271455053471954, 0.007497029485883552, 0.12869521031169512, 0.007583172147341562, 0.036897800256699115, 0.031985613325167445, 0.12054674641846518, 0.1123907977093196, 0.05107231307814517, 0.07140152351257285, 0.026174227429753852,......, kwargs: {} (api_request.py:56)

[2023-03-09T12:43:40.935Z] [2023-03-09 12:35:22 - ERROR - pymilvus.decorators]: RPC error: [search], <MilvusException: (code=1, message=fail to search on all shard leaders, err=All attempts results:

[2023-03-09T12:43:40.935Z] attempt #1:can not find client of node 4

[2023-03-09T12:43:40.935Z] attempt #2:context canceled

[2023-03-09T12:43:40.935Z] )>, <Time:{'RPC start': '2023-03-09 12:35:21.336027', 'RPC error': '2023-03-09 12:35:22.020404'}> (decorators.py:108)

[2023-03-09T12:43:40.935Z] [2023-03-09 12:35:22 - ERROR - ci_test]: Traceback (most recent call last):

[2023-03-09T12:43:40.935Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 26, in inner_wrapper

[2023-03-09T12:43:40.935Z]     res = func(*args, **_kwargs)

[2023-03-09T12:43:40.935Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 57, in api_request

[2023-03-09T12:43:40.935Z]     return func(*arg, **kwargs)

[2023-03-09T12:43:40.935Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 660, in search

[2023-03-09T12:43:40.935Z]     res = conn.search(self._name, data, anns_field, param, limit, expr,

[2023-03-09T12:43:40.935Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler

[2023-03-09T12:43:40.935Z]     raise e

[2023-03-09T12:43:40.935Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler

[2023-03-09T12:43:40.935Z]     return func(*args, **kwargs)

[2023-03-09T12:43:40.935Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2023-03-09T12:43:40.935Z]     ret = func(self, *args, **kwargs)

[2023-03-09T12:43:40.935Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler

[2023-03-09T12:43:40.935Z]     raise e

[2023-03-09T12:43:40.935Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler

[2023-03-09T12:43:40.935Z]     return func(self, *args, **kwargs)

[2023-03-09T12:43:40.935Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 518, in search

[2023-03-09T12:43:40.935Z]     return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)

[2023-03-09T12:43:40.935Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 487, in _execute_search_requests

[2023-03-09T12:43:40.935Z]     raise pre_err

[2023-03-09T12:43:40.935Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 478, in _execute_search_requests

[2023-03-09T12:43:40.935Z]     raise MilvusException(response.status.error_code, response.status.reason)

[2023-03-09T12:43:40.935Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=All attempts results:

[2023-03-09T12:43:40.935Z] attempt #1:can not find client of node 4

[2023-03-09T12:43:40.935Z] attempt #2:context canceled

[2023-03-09T12:43:40.935Z] )>

[2023-03-09T12:43:40.935Z]  (api_request.py:39)

[2023-03-09T12:43:40.935Z] [2023-03-09 12:35:22 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=fail to search on all shard leaders, err=All attempts results:

[2023-03-09T12:43:40.935Z] attempt #1:can not find client of node 4

[2023-03-09T12:43:40.935Z] attempt #2:context canceled

[2023-03-09T12:43:40.935Z] )> (api_request.py:40)

[2023-03-09T12:43:40.935Z] [2023-03-09 12:35:22 - INFO - ci_test]: search_results_check: checking the searching results (func_check.py:234)[get_env_variable] failed to get environment variables : 'CI_LOG_PATH', use default path : /tmp/ci_logs

[2023-03-09T12:43:40.935Z] [create_path] folder(/tmp/ci_logs) is not exist.

[2023-03-09T12:43:40.935Z] [create_path] create path now...

[2023-03-09T12:43:40.935Z] 

[2023-03-09T12:43:40.935Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------

[2023-03-09T12:43:40.935Z] =========================== short test summary info ============================

[2023-03-09T12:43:40.935Z] FAILED testcases/test_action_first_deployment.py::TestActionFirstDeployment::test_task_all[HNSW-only_growing-not_string_indexed-is_deleted-is_compacted-2] - TypeError: object of type 'Error' has no len()

[2023-03-09T12:43:40.936Z] ============= 1 failed, 26 passed, 23 skipped in 542.24s (0:09:02) =============

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

milvus mode: cluster deploy task: reinstall old image tag: v2.2.2 new image tag: 2.2.0-20230309-130ab6da failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_for_release_cron/detail/deploy_test_kafka_for_release_cron/489/pipeline log:

artifacts-kafka-cluster-reinstall-489-server-second-deployment-logs.tar.gz

artifacts-kafka-cluster-reinstall-489-pytest-logs.tar.gz

Anything else?

No response

congqixia commented 1 year ago

Same reason as #22661 working on it /assign

congqixia commented 1 year ago

patch merged, please verify /unassign /assign @zhuwenxing

zhuwenxing commented 1 year ago

Not reproduced in 2.2.0-20230310-b2ece6a5