milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.05k stars 2.88k forks source link

[Bug]: Search raise error `err=LackSegment(segmentID=438555785908160116)` after pulsar pod failure chaos #21562

Closed zhuwenxing closed 1 year ago

zhuwenxing commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:2.2.2-20230105-b1c0b22a
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): pulsar     
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus==2.3.0.dev21
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2023-01-05T22:37:41.949Z] [2023-01-05 22:36:13 - INFO - ci_test]: assert flush: 3.4777255058288574, entities: 18640 (test_all_collections_after_chaos.py:56)

[2023-01-05T22:37:41.949Z] [2023-01-05 22:36:13 - INFO - ci_test]: index info: [{'collection': 'Checker__mgvo7mF4', 'field': 'float_vector', 'index_name': 'index__TCTGu9Uj', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}] (test_all_collections_after_chaos.py:72)

[2023-01-05T22:37:41.949Z] [2023-01-05 22:36:13 - DEBUG - ci_test]: (api_request)  : [Collection.load] args: [None, 1, 120], kwargs: {} (api_request.py:56)

[2023-01-05T22:37:41.949Z] [2023-01-05 22:36:13 - DEBUG - ci_test]: (api_response) : None  (api_request.py:31)

[2023-01-05T22:37:41.949Z] [2023-01-05 22:36:13 - INFO - ci_test]: [test][2023-01-05T22:36:13Z] [0.00551130s] Checker__mgvo7mF4 load -> None (wrapper.py:30)

[2023-01-05T22:37:41.949Z] [2023-01-05 22:36:13 - DEBUG - ci_test]: (api_request)  : [Collection.search] args: [[[0.06630883191325627, 0.06644089921218813, 0.11441794917696471, 0.0545114342084132, 0.1100248412014657, 0.1052605934198708, 0.023904586171334192, 0.019702047492188585, 0.07502738366279468, 0.028772766750570136, 0.039952824576879695, 0.024931854798526762, 0.07527740975143511, 0.10105612977859284, 0......, kwargs: {} (api_request.py:56)

[2023-01-05T22:37:41.949Z] [2023-01-05 22:36:24 - ERROR - pymilvus.decorators]: RPC error: [search], <MilvusException: (code=1, message=fail to search on all shard leaders, err=All attempts results:

[2023-01-05T22:37:41.949Z] attempt #1:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_8_438555785908129327v0 is not available in any replica, err=LackSegment(segmentID=438555785908160116)

[2023-01-05T22:37:41.949Z] attempt #2:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_9_438555785908129327v1 is not available in any replica, err=LackSegment(segmentID=438555785908160118)

[2023-01-05T22:37:41.949Z] attempt #3:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_9_438555785908129327v1 is not available in any replica, err=LackSegment(segmentID=438555785908160118)

[2023-01-05T22:37:41.949Z] attempt #4:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_8_438555785908129327v0 is not available in any replica, err=LackSegment(segmentID=438555785908160116)

[2023-01-05T22:37:41.949Z] attempt #5:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_8_438555785908129327v0 is not available in any replica, err=LackSegment(segmentID=438555785908133191)

[2023-01-05T22:37:41.949Z] attempt #6:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_8_438555785908129327v0 is not available in any replica, err=LackSegment(segmentID=438555785908160255)

[2023-01-05T22:37:41.949Z] attempt #7:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_9_438555785908129327v1 is not available in any replica, err=LackSegment(segmentID=438555785908160257)

[2023-01-05T22:37:41.949Z] attempt #8:context deadline exceeded

[2023-01-05T22:37:41.949Z] )>, <Time:{'RPC start': '2023-01-05 22:36:13.980834', 'RPC error': '2023-01-05 22:36:24.253099'}> (decorators.py:108)

[2023-01-05T22:37:41.949Z] [2023-01-05 22:36:24 - ERROR - ci_test]: Traceback (most recent call last):

[2023-01-05T22:37:41.949Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 26, in inner_wrapper

[2023-01-05T22:37:41.949Z]     res = func(*args, **_kwargs)

[2023-01-05T22:37:41.949Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 57, in api_request

[2023-01-05T22:37:41.949Z]     return func(*arg, **kwargs)

[2023-01-05T22:37:41.949Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 609, in search

[2023-01-05T22:37:41.949Z]     res = conn.search(self._name, data, anns_field, param, limit, expr,

[2023-01-05T22:37:41.949Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler

[2023-01-05T22:37:41.949Z]     raise e

[2023-01-05T22:37:41.949Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler

[2023-01-05T22:37:41.949Z]     return func(*args, **kwargs)

[2023-01-05T22:37:41.949Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2023-01-05T22:37:41.949Z]     ret = func(self, *args, **kwargs)

[2023-01-05T22:37:41.949Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler

[2023-01-05T22:37:41.949Z]     raise e

[2023-01-05T22:37:41.949Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler

[2023-01-05T22:37:41.949Z]     return func(self, *args, **kwargs)

[2023-01-05T22:37:41.949Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 470, in search

[2023-01-05T22:37:41.949Z]     return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)

[2023-01-05T22:37:41.950Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 439, in _execute_search_requests

[2023-01-05T22:37:41.950Z]     raise pre_err

[2023-01-05T22:37:41.950Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 430, in _execute_search_requests

[2023-01-05T22:37:41.950Z]     raise MilvusException(response.status.error_code, response.status.reason)

[2023-01-05T22:37:41.950Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=All attempts results:

[2023-01-05T22:37:41.950Z] attempt #1:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_8_438555785908129327v0 is not available in any replica, err=LackSegment(segmentID=438555785908160116)

[2023-01-05T22:37:41.950Z] attempt #2:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_9_438555785908129327v1 is not available in any replica, err=LackSegment(segmentID=438555785908160118)

[2023-01-05T22:37:41.950Z] attempt #3:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_9_438555785908129327v1 is not available in any replica, err=LackSegment(segmentID=438555785908160118)

[2023-01-05T22:37:41.950Z] attempt #4:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_8_438555785908129327v0 is not available in any replica, err=LackSegment(segmentID=438555785908160116)

[2023-01-05T22:37:41.950Z] attempt #5:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_8_438555785908129327v0 is not available in any replica, err=LackSegment(segmentID=438555785908133191)

[2023-01-05T22:37:41.950Z] attempt #6:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_8_438555785908129327v0 is not available in any replica, err=LackSegment(segmentID=438555785908160255)

[2023-01-05T22:37:41.950Z] attempt #7:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_9_438555785908129327v1 is not available in any replica, err=LackSegment(segmentID=438555785908160257)

[2023-01-05T22:37:41.950Z] attempt #8:context deadline exceeded

[2023-01-05T22:37:41.950Z] )>

[2023-01-05T22:37:41.950Z]  (api_request.py:39)

[2023-01-05T22:37:41.950Z] [2023-01-05 22:36:24 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=fail to search on all shard leaders, err=All attempts results:

[2023-01-05T22:37:41.950Z] attempt #1:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_8_438555785908129327v0 is not available in any replica, err=LackSegment(segmentID=438555785908160116)

[2023-01-05T22:37:41.950Z] attempt #2:fail t...... (api_request.py:40)

[2023-01-05T22:37:41.950Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------

[2023-01-05T22:37:41.950Z] =========================== short test summary info ============================

[2023-01-05T22:37:41.950Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[Checker__mgvo7mF4] - AssertionError

[2023-01-05T22:37:41.950Z] =================== 1 failed, 10 passed in 105.52s (0:01:45) ===================

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

chaos type: pod-failure image tag: 2.2.2-20230105-b1c0b22a target pod: pulsar failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-for-release-cron/detail/chaos-test-for-release-cron/1009/pipeline log: artifacts-pulsar-pod-failure-1009-server-logs.tar.gz artifacts-pulsar-pod-failure-1009-pytest-logs.tar.gz

Anything else?

No response

yanliang567 commented 1 year ago

/assign @jiaoew1991 /unassign

jiaoew1991 commented 1 year ago

/assign @sunby /unassign

sunby commented 1 year ago

It is casued by handoff which has already removed in 2.2.0 branch. Please test it based on 2.2.0

yanliang567 commented 1 year ago

/assign @zhuwenxing

sunby commented 1 year ago

/unassign

zhuwenxing commented 1 year ago

Not reproduced any more, so close it!