Closed ThreadDao closed 1 year ago
By the way, resource group info tell me that collection is loaded:
ResourceGroupInfo:
<name:RG_0>,
<capacity:1>,
<num_available_node:1>,
<num_loaded_replica:{'ResourceGroup_111': 1}>,
<num_outgoing_node:{}>,
<num_incoming_node:{}>
ResourceGroupInfo:
<name:RG_1>,
<capacity:1>,
<num_available_node:1>,
<num_loaded_replica:{'ResourceGroup_111': 1}>,
<num_outgoing_node:{}>,
<num_incoming_node:{}>
<name:__default_resource_group>,
<capacity:1000000>,
<num_available_node:2>,
<num_loaded_replica:{'Checker__CuPg5fdN': 1, 'SearchChecker__SklasCg6': 1, 'Hello_Milvus': 1, 'Checker__y2GOEz7y': 1, 'DeleteChecker__tDpoe0M2': 1, 'InsertChecker__rzJLYNVb': 1, 'QueryChecker__F8VgFuBz': 1, 'IndexChecker__8BWNEE5P': 1, 'FlushChecker__tuTHHnqt': 1, 'ResourceGroup_222': 2, 'Checker__G8vrZ3zA': 1, 'Checker__ABlBpC4n': 1, 'Checker__T8KwV6BY': 1, 'CreateChecker__cS1h9MmY': 1}>,
<num_outgoing_node:{}>,
<num_incoming_node:{}>
/assign @weiliu1031 /unassign
fix on #22370 please verify on this
/assign @yanliang567
/assign @ThreadDao please help to verify the fix
fixed 2.2.0-20230228-3e560841
verified Jenkins job: https://qa-jenkins.milvus.io/job/chaos-test-resource-group/75/
Is there an existing issue for this?
Environment
Current Behavior
RG_0
and transfer 1 qn from default_rg intoRG_0
, create rgRG_1
and transfer 1 qn from default_rg intoRG_1
ResourceGroup_111
,{"index_type": "HNSW", "metric_type": "L2", "params": {"M": 48, "efConstruction": 500}}
ResourceGroup_111
with 2 replicas into two rgs[RG_0, RG_1]
[2023-02-21T04:08:57.326Z] attempt #1:fail to get shard leaders from QueryCoord: collection 439602829507890674 is not fully loaded
[2023-02-21T04:08:57.326Z] attempt #2:fail to get shard leaders from QueryCoord: collection 439602829507890674 is not fully loaded
[2023-02-21T04:08:57.326Z] attempt #3:fail to get shard leaders from QueryCoord: collection 439602829507890674 is not fully loaded
[2023-02-21T04:08:57.326Z] attempt #4:fail to get shard leaders from QueryCoord: collection 439602829507890674 is not fully loaded
[2023-02-21T04:08:57.326Z] attempt #5:fail to get shard leaders from QueryCoord: collection 439602829507890674 is not fully loaded
[2023-02-21T04:08:57.326Z] attempt #6:fail to get shard leaders from QueryCoord: collection 439602829507890674 is not fully loaded
[2023-02-21T04:08:57.326Z] attempt #7:fail to get shard leaders from QueryCoord: collection 439602829507890674 is not fully loaded
[2023-02-21T04:08:57.326Z] attempt #8:context deadline exceeded
[2023-02-21T04:08:57.326Z] )>, <Time:{'RPC start': '2023-02-21 04:08:47.221457', 'RPC error': '2023-02-21 04:08:57.282384'}> (decorators.py:108)
[2023-02-21T04:08:57.326Z] [2023-02-21 04:08:57 - ERROR - ci_test]: Traceback (most recent call last):
[2023-02-21T04:08:57.326Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 26, in inner_wrapper
[2023-02-21T04:08:57.326Z] res = func(*args, **_kwargs)
[2023-02-21T04:08:57.326Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 57, in api_request
[2023-02-21T04:08:57.326Z] return func(*arg, **kwargs)
[2023-02-21T04:08:57.326Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 614, in search
[2023-02-21T04:08:57.326Z] res = conn.search(self._name, data, anns_field, param, limit, expr,
[2023-02-21T04:08:57.326Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
[2023-02-21T04:08:57.326Z] raise e
[2023-02-21T04:08:57.326Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
[2023-02-21T04:08:57.326Z] return func(*args, **kwargs)
[2023-02-21T04:08:57.326Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
[2023-02-21T04:08:57.326Z] ret = func(self, *args, **kwargs)
[2023-02-21T04:08:57.327Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler
[2023-02-21T04:08:57.327Z] raise e
[2023-02-21T04:08:57.327Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
[2023-02-21T04:08:57.327Z] return func(self, *args, **kwargs)
[2023-02-21T04:08:57.327Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 483, in search
[2023-02-21T04:08:57.327Z] return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)
[2023-02-21T04:08:57.327Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 452, in _execute_search_requests
[2023-02-21T04:08:57.327Z] raise pre_err
[2023-02-21T04:08:57.327Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 443, in _execute_search_requests
[2023-02-21T04:08:57.327Z] raise MilvusException(response.status.error_code, response.status.reason)
[2023-02-21T04:08:57.327Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=All attempts results:
[2023-02-21T04:08:57.327Z] attempt #1:fail to get shard leaders from QueryCoord: collection 439602829507890674 is not fully loaded
[2023-02-21T04:08:57.327Z] attempt #2:fail to get shard leaders from QueryCoord: collection 439602829507890674 is not fully loaded
[2023-02-21T04:08:57.327Z] attempt #3:fail to get shard leaders from QueryCoord: collection 439602829507890674 is not fully loaded
[2023-02-21T04:08:57.327Z] attempt #4:fail to get shard leaders from QueryCoord: collection 439602829507890674 is not fully loaded
[2023-02-21T04:08:57.327Z] attempt #5:fail to get shard leaders from QueryCoord: collection 439602829507890674 is not fully loaded
[2023-02-21T04:08:57.327Z] attempt #6:fail to get shard leaders from QueryCoord: collection 439602829507890674 is not fully loaded
[2023-02-21T04:08:57.327Z] attempt #7:fail to get shard leaders from QueryCoord: collection 439602829507890674 is not fully loaded
[2023-02-21T04:08:57.327Z] attempt #8:context deadline exceeded
[2023-02-21T04:08:57.327Z] )>
[2023-02-21T04:08:57.327Z] (api_request.py:39)
c.get_query_segment_info('ResourceGroup_111') [] c.get_replicas('ResourceGroup_111') RPC error: [get_replicas], <MilvusException: (code=15, message=failed to get replica info, err=failed to get channels, collection not loaded[CollectionNotFound])>, <Time:{'RPC start': '2023-02-21 14:00:47.369003', 'RPC error': '2023-02-21 14:00:47.411205'}> Traceback (most recent call last): File "", line 1, in
File "/Users/nausicca/.virtualenvs/milvus/lib/python3.8/site-packages/pymilvus/client/stub.py", line 1047, in get_replicas
return handler.get_replicas(collection_name, timeout=timeout, kwargs)
File "/Users/nausicca/.virtualenvs/milvus/lib/python3.8/site-packages/pymilvus/decorators.py", line 109, in handler
raise e
File "/Users/nausicca/.virtualenvs/milvus/lib/python3.8/site-packages/pymilvus/decorators.py", line 105, in handler
return func(*args, *kwargs)
File "/Users/nausicca/.virtualenvs/milvus/lib/python3.8/site-packages/pymilvus/decorators.py", line 136, in handler
ret = func(self, args, kwargs)
File "/Users/nausicca/.virtualenvs/milvus/lib/python3.8/site-packages/pymilvus/decorators.py", line 85, in handler
raise e
File "/Users/nausicca/.virtualenvs/milvus/lib/python3.8/site-packages/pymilvus/decorators.py", line 50, in handler
return func(self, *args, **kwargs)
File "/Users/nausicca/.virtualenvs/milvus/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 1041, in get_replicas
raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=15, message=failed to get replica info, err=failed to get channels, collection not loaded[CollectionNotFound])>
<MilvusException: (code=15, message=failed to get replica info, err=failed to get shard leader for shard collectionID:439602829507890740 channelName:"by-dev-rootcoord-dml_7_439602829507890740v1" seek_position:<channel_name:"by-dev-rootcoord-dml_7_439602829507890740v1" msgID:"\010\007\020=\030\000 \000" msgGroup:"datanode-8-by-dev-rootcoord-dml_7_439602829507890740v1-true" timestamp:439602877892657154 > flushedSegmentIds:439602829507890750 , the collection not loaded or leader is offline[NodeNotFound(0)])
Milvus Log
chaos type: pod-kill image tag: master-20230221-b7c0d12d target pod: querycoord failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-resource-group/detail/chaos-test-resource-group/8/pipeline
log: artifacts-querycoord-pod-kill-8-server-logs.tar.gz
artifacts-querycoord-pod-kill-8-pytest-logs.tar.gz
Anything else?
No response