milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.05k stars 2.88k forks source link

[Bug]: Search failed with error `err=LackSegment(segmentID=438670081959595462)` without any chaos #21643

Closed zhuwenxing closed 1 year ago

zhuwenxing commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:2.2.0-20230110-e68374b6
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):kafka    
- SDK version(e.g. pymilvus v2.0.0rc2):pymilvus==2.3.0.dev21
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

image

[2023-01-10T23:37:08.312Z] [2023-01-10 23:36:38 - DEBUG - ci_test]: (api_request)  : [Collection.flush] args: [], kwargs: {'timeout': 120} (api_request.py:56)

[2023-01-10T23:37:08.312Z] [2023-01-10 23:36:41 - DEBUG - ci_test]: (api_response) : None  (api_request.py:31)

[2023-01-10T23:37:08.312Z] [2023-01-10 23:36:41 - INFO - ci_test]: [test][2023-01-10T23:36:38Z] [3.01650127s] Hello_Milvus flush -> None (wrapper.py:30)

[2023-01-10T23:37:08.312Z] [2023-01-10 23:36:41 - INFO - ci_test]: assert entities: 6000 (test_data_persistence.py:83)

[2023-01-10T23:37:08.312Z] [2023-01-10 23:36:41 - DEBUG - ci_test]: (api_request)  : [Collection.load] args: [None, 1, 120], kwargs: {} (api_request.py:56)

[2023-01-10T23:37:08.312Z] [2023-01-10 23:36:45 - DEBUG - ci_test]: (api_response) : None  (api_request.py:31)

[2023-01-10T23:37:08.312Z] [2023-01-10 23:36:45 - INFO - ci_test]: [test][2023-01-10T23:36:41Z] [4.02758356s] Hello_Milvus load -> None (wrapper.py:30)

[2023-01-10T23:37:08.312Z] [2023-01-10 23:36:45 - INFO - ci_test]: assert load: 4.027791976928711 (test_data_persistence.py:89)

[2023-01-10T23:37:08.312Z] [2023-01-10 23:36:45 - DEBUG - ci_test]: (api_request)  : [Collection.search] args: [[[0.1621692693369214, 0.14960367018477766, 0.06662597079284452, 0.048771281016029615, 0.01993462196710322, 0.01525090796482271, 0.037753793453564506, 0.14377047272597487, 0.11220278574791062, 0.08271625544987456, 0.05897753566487675, 0.14719109183714818, 0.049312973433789646, 0.12836404829603273, 0......, kwargs: {} (api_request.py:56)

[2023-01-10T23:37:08.312Z] [2023-01-10 23:37:05 - ERROR - pymilvus.decorators]: RPC error: [search], <MilvusException: (code=1, message=fail to search on all shard leaders, err=All attempts results:

[2023-01-10T23:37:08.312Z] attempt #1:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_2_438670081959595412v0 is not available in any replica, err=LackSegment(segmentID=438670081959595462)

[2023-01-10T23:37:08.312Z] attempt #2:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_3_438670081959595412v1 is not available in any replica, err=LackSegment(segmentID=438670081959595461)

[2023-01-10T23:37:08.313Z] attempt #3:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_3_438670081959595412v1 is not available in any replica, err=LackSegment(segmentID=438670081959595461)

[2023-01-10T23:37:08.313Z] attempt #4:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_3_438670081959595412v1 is not available in any replica, err=LackSegment(segmentID=438670081959595461)

[2023-01-10T23:37:08.313Z] attempt #5:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_2_438670081959595412v0 is not available in any replica, err=LackSegment(segmentID=438670081959595462)

[2023-01-10T23:37:08.313Z] attempt #6:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_3_438670081959595412v1 is not available in any replica, err=LackSegment(segmentID=438670081959595461)

[2023-01-10T23:37:08.313Z] attempt #7:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_2_438670081959595412v0 is not available in any replica, err=LackSegment(segmentID=438670081959595462)

[2023-01-10T23:37:08.313Z] attempt #8:context deadline exceeded

[2023-01-10T23:37:08.313Z] )>, <Time:{'RPC start': '2023-01-10 23:36:45.889116', 'RPC error': '2023-01-10 23:37:05.891594'}> (decorators.py:108)

[2023-01-10T23:37:08.313Z] [2023-01-10 23:37:05 - ERROR - ci_test]: Traceback (most recent call last):

[2023-01-10T23:37:08.313Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 26, in inner_wrapper

[2023-01-10T23:37:08.313Z]     res = func(*args, **_kwargs)

[2023-01-10T23:37:08.313Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 57, in api_request

[2023-01-10T23:37:08.313Z]     return func(*arg, **kwargs)

[2023-01-10T23:37:08.313Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 609, in search

[2023-01-10T23:37:08.313Z]     res = conn.search(self._name, data, anns_field, param, limit, expr,

[2023-01-10T23:37:08.313Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler

[2023-01-10T23:37:08.313Z]     raise e

[2023-01-10T23:37:08.313Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler

[2023-01-10T23:37:08.313Z]     return func(*args, **kwargs)

[2023-01-10T23:37:08.313Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2023-01-10T23:37:08.313Z]     ret = func(self, *args, **kwargs)

[2023-01-10T23:37:08.313Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler

[2023-01-10T23:37:08.313Z]     raise e

[2023-01-10T23:37:08.313Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler

[2023-01-10T23:37:08.313Z]     return func(self, *args, **kwargs)

[2023-01-10T23:37:08.313Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 470, in search

[2023-01-10T23:37:08.313Z]     return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)

[2023-01-10T23:37:08.313Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 439, in _execute_search_requests

[2023-01-10T23:37:08.313Z]     raise pre_err

[2023-01-10T23:37:08.313Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 430, in _execute_search_requests

[2023-01-10T23:37:08.313Z]     raise MilvusException(response.status.error_code, response.status.reason)

[2023-01-10T23:37:08.313Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=All attempts results:

[2023-01-10T23:37:08.313Z] attempt #1:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_2_438670081959595412v0 is not available in any replica, err=LackSegment(segmentID=438670081959595462)

[2023-01-10T23:37:08.313Z] attempt #2:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_3_438670081959595412v1 is not available in any replica, err=LackSegment(segmentID=438670081959595461)

[2023-01-10T23:37:08.313Z] attempt #3:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_3_438670081959595412v1 is not available in any replica, err=LackSegment(segmentID=438670081959595461)

[2023-01-10T23:37:08.313Z] attempt #4:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_3_438670081959595412v1 is not available in any replica, err=LackSegment(segmentID=438670081959595461)

[2023-01-10T23:37:08.313Z] attempt #5:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_2_438670081959595412v0 is not available in any replica, err=LackSegment(segmentID=438670081959595462)

[2023-01-10T23:37:08.313Z] attempt #6:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_3_438670081959595412v1 is not available in any replica, err=LackSegment(segmentID=438670081959595461)

[2023-01-10T23:37:08.313Z] attempt #7:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_2_438670081959595412v0 is not available in any replica, err=LackSegment(segmentID=438670081959595462)

[2023-01-10T23:37:08.313Z] attempt #8:context deadline exceeded

[2023-01-10T23:37:08.313Z] )>

[2023-01-10T23:37:08.313Z]  (api_request.py:39)

[2023-01-10T23:37:08.313Z] [2023-01-10 23:37:05 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=fail to search on all shard leaders, err=All attempts results:

[2023-01-10T23:37:08.313Z] attempt #1:fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_2_438670081959595412v0 is not available in any replica, err=LackSegment(segmentID=438670081959595462)

[2023-01-10T23:37:08.313Z] attempt #2:fail t...... (api_request.py:40)

[2023-01-10T23:37:08.313Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------

[2023-01-10T23:37:08.313Z] =========================== short test summary info ============================

[2023-01-10T23:37:08.313Z] FAILED testcases/test_data_persistence.py::TestDataPersistence::test_milvus_default - AssertionError

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-for-release-cron/detail/chaos-test-kafka-for-release-cron/1148/pipeline log:

artifacts-rootcoord-pod-failure-1148-server-logs.tar.gz artifacts-rootcoord-pod-failure-1148-pytest-logs.tar.gz

Anything else?

No response

zhuwenxing commented 1 year ago

image failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-for-release-cron/detail/chaos-test-kafka-for-release-cron/1150/pipeline log: artifacts-minio-pod-failure-1150-pytest-logs.tar.gz artifacts-minio-pod-failure-1150-server-logs.tar.gz

zhuwenxing commented 1 year ago

/assign @sunby

Please take a look

yanliang567 commented 1 year ago

/unassign

zhuwenxing commented 1 year ago

seems be same as https://github.com/milvus-io/milvus/issues/21607

sunby commented 1 year ago

@zhuwenxing Fixed in https://github.com/milvus-io/milvus/pull/21762

sunby commented 1 year ago

/assign @zhuwenxing /unassign

zhuwenxing commented 1 year ago

verified and passed with 2.2.0-20230207-e3501f7a