milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.05k stars 2.88k forks source link

[Bug]: Search failed with error `fail to search on all shard leaders, err=attempt #0: fail to get shard leaders from QueryCoord: channel xxx is not available in any replica, err=NodeOffline(nodeID=1); NodeOffline(nodeID=9)` after upgrading from v2.2.5 to master-20230506-ad75afdc #23936

Closed zhuwenxing closed 1 year ago

zhuwenxing commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:v2.2.5 --> master-20230506-ad75afdc
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):cluster    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2023-05-07T10:24:35.314Z] =========================== short test summary info ============================

[2023-05-07T10:24:35.314Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=attempt #0: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_111_441306787081076125v1 is not available in any replica, err=NodeOffline(nodeID=2); NodeOffline(nodeID=1): attempt #1: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_110_441306787081076125v0 is not available in any replica, err=NodeOffline(nodeID=2); NodeOffline(nodeID=1): attempt #2: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_110_441306787081076125v0 is not available in any replica, err=NodeOffline(nodeID=1); NodeOffline(nodeID=2): attempt #3: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_110_441306787081076125v0 is not available in any replica, err=NodeOffline(nodeID=1); NodeOffline(nodeID=2): attempt #4: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_110_441306787081076125v0 is not available in any replica, err=NodeOffline(nodeID=2); NodeOffline(nodeID=1): attempt #5: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_110_441306787081076125v0 is not available in any replica, err=NodeOffline(nodeID=2); NodeOffline(nodeID=1): attempt #6: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_110_441306787081076125v0 is not available in any replica, err=NodeOffline(nodeID=2); NodeOffline(nodeID=1): context done during sleep after run#6: context deadline exceeded)>

[2023-05-07T10:24:35.314Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_only_growing_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2

[2023-05-07T10:24:35.314Z]  +  where 2 = int('2')

[2023-05-07T10:24:35.314Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_1_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=attempt #0: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_104_441306787081074645v0 is not available in any replica, err=NodeOffline(nodeID=1): attempt #1: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_104_441306787081074645v0 is not available in any replica, err=NodeOffline(nodeID=1): attempt #2: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_104_441306787081074645v0 is not available in any replica, err=NodeOffline(nodeID=1): attempt #3: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_104_441306787081074645v0 is not available in any replica, err=NodeOffline(nodeID=1): attempt #4: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_104_441306787081074645v0 is not available in any replica, err=NodeOffline(nodeID=1): attempt #5: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_104_441306787081074645v0 is not available in any replica, err=NodeOffline(nodeID=1): attempt #6: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_104_441306787081074645v0 is not available in any replica, err=NodeOffline(nodeID=1): context done during sleep after run#6: context deadline exceeded)>

[2023-05-07T10:24:35.314Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_1_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=attempt #0: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_124_441306787081279914v0 is not available in any replica, err=LackSegment(segmentID=441306787081287288): attempt #1: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_124_441306787081279914v0 is not available in any replica, err=LackSegment(segmentID=441306787081287288): attempt #2: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_124_441306787081279914v0 is not available in any replica, err=LackSegment(segmentID=441306787081287288): attempt #3: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_124_441306787081279914v0 is not available in any replica, err=LackSegment(segmentID=441306787081287288): attempt #4: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_124_441306787081279914v0 is not available in any replica, err=LackSegment(segmentID=441306787081287288): attempt #5: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_124_441306787081279914v0 is not available in any replica, err=LackSegment(segmentID=441306787081287288): attempt #6: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_125_441306787081279914v1 is not available in any replica, err=NodeOffline(nodeID=1): context done during sleep after run#6: context deadline exceeded)>

[2023-05-07T10:24:35.315Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_1_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=attempt #0: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_118_441306787081277961v0 is not available in any replica, err=NodeOffline(nodeID=2): attempt #1: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_119_441306787081277961v1 is not available in any replica, err=NodeOffline(nodeID=1): attempt #2: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_118_441306787081277961v0 is not available in any replica, err=NodeOffline(nodeID=2): attempt #3: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_118_441306787081277961v0 is not available in any replica, err=NodeOffline(nodeID=2): attempt #4: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_118_441306787081277961v0 is not available in any replica, err=NodeOffline(nodeID=2): attempt #5: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_118_441306787081277961v0 is not available in any replica, err=NodeOffline(nodeID=2): attempt #6: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_118_441306787081277961v0 is not available in any replica, err=NodeOffline(nodeID=2): context done during sleep after run#6: context deadline exceeded)>

[2023-05-07T10:24:35.315Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_1_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=attempt #0: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_113_441306787081076163v1 is not available in any replica, err=NodeOffline(nodeID=1): attempt #1: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_113_441306787081076163v1 is not available in any replica, err=NodeOffline(nodeID=1): attempt #2: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_113_441306787081076163v1 is not available in any replica, err=NodeOffline(nodeID=1): attempt #3: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_113_441306787081076163v1 is not available in any replica, err=NodeOffline(nodeID=1): attempt #4: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_113_441306787081076163v1 is not available in any replica, err=NodeOffline(nodeID=1): attempt #5: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_113_441306787081076163v1 is not available in any replica, err=NodeOffline(nodeID=1): attempt #6: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_113_441306787081076163v1 is not available in any replica, err=NodeOffline(nodeID=1): context done during sleep after run#6: context deadline exceeded)>

[2023-05-07T10:24:35.315Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2

[2023-05-07T10:24:35.315Z]  +  where 2 = int('2')

[2023-05-07T10:24:35.315Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000)>

[2023-05-07T10:24:35.315Z] [get_env_variable] failed to get environment variables : 'CI_LOG_PATH', use default path : /tmp/ci_logs

[2023-05-07T10:24:35.315Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=attempt #0: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_126_441306787081280039v0 is not available in any replica, err=NodeOffline(nodeID=1); NodeOffline(nodeID=2): attempt #1: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_126_441306787081280039v0 is not available in any replica, err=NodeOffline(nodeID=2); NodeOffline(nodeID=1): attempt #2: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_126_441306787081280039v0 is not available in any replica, err=NodeOffline(nodeID=2); NodeOffline(nodeID=1): attempt #3: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_126_441306787081280039v0 is not available in any replica, err=NodeOffline(nodeID=2); NodeOffline(nodeID=1): attempt #4: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_126_441306787081280039v0 is not available in any replica, err=NodeOffline(nodeID=1); NodeOffline(nodeID=2): attempt #5: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_126_441306787081280039v0 is not available in any replica, err=NodeOffline(nodeID=2); NodeOffline(nodeID=1): attempt #6: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_126_441306787081280039v0 is not available in any replica, err=NodeOffline(nodeID=2); NodeOffline(nodeID=1): context done during sleep after run#6: context deadline exceeded)>

[2023-05-07T10:24:35.315Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2

[2023-05-07T10:24:35.315Z]  +  where 2 = int('2')

[2023-05-07T10:24:35.315Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000)>

[2023-05-07T10:24:35.315Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000)>

[2023-05-07T10:24:35.315Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000)>

[2023-05-07T10:24:35.316Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=attempt #0: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_100_441306787081073976v0 is not available in any replica, err=NodeOffline(nodeID=1); NodeOffline(nodeID=9): attempt #1: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_100_441306787081073976v0 is not available in any replica, err=NodeOffline(nodeID=9); NodeOffline(nodeID=1): attempt #2: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_100_441306787081073976v0 is not available in any replica, err=NodeOffline(nodeID=1); NodeOffline(nodeID=9): attempt #3: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_100_441306787081073976v0 is not available in any replica, err=NodeOffline(nodeID=9); NodeOffline(nodeID=1): attempt #4: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_100_441306787081073976v0 is not available in any replica, err=NodeOffline(nodeID=9); NodeOffline(nodeID=1): attempt #5: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_100_441306787081073976v0 is not available in any replica, err=NodeOffline(nodeID=1); NodeOffline(nodeID=9): attempt #6: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_100_441306787081073976v0 is not available in any replica, err=NodeOffline(nodeID=1); NodeOffline(nodeID=9): context done during sleep after run#6: context deadline exceeded)>

[2023-05-07T10:24:35.316Z] ================== 14 failed, 36 passed in 1027.90s (0:17:07) ==================
[2023-05-07T10:24:35.312Z] self = <test_action_second_deployment.TestActionSecondDeployment object at 0x7f46756f4940>

[2023-05-07T10:24:35.312Z] all_collection_name = 'deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000'

[2023-05-07T10:24:35.312Z] data_size = 3000

[2023-05-07T10:24:35.312Z] 

[2023-05-07T10:24:35.312Z]     @pytest.mark.tags(CaseLabel.L3)

[2023-05-07T10:24:35.312Z]     def test_check(self, all_collection_name, data_size):

[2023-05-07T10:24:35.312Z]         """

[2023-05-07T10:24:35.312Z]         before reinstall: create collection

[2023-05-07T10:24:35.312Z]         """

[2023-05-07T10:24:35.312Z]         self._connect()

[2023-05-07T10:24:35.312Z]         ms = MilvusSys()

[2023-05-07T10:24:35.312Z]         name = all_collection_name

[2023-05-07T10:24:35.312Z]         is_binary = False

[2023-05-07T10:24:35.312Z]         if "BIN" in name:

[2023-05-07T10:24:35.312Z]             is_binary = True

[2023-05-07T10:24:35.312Z]         collection_w, _ = self.collection_wrap.init_collection(name=name)

[2023-05-07T10:24:35.312Z]         self.collection_w = collection_w

[2023-05-07T10:24:35.312Z]         schema = collection_w.schema

[2023-05-07T10:24:35.312Z]         data_type = [field.dtype for field in schema.fields]

[2023-05-07T10:24:35.312Z]         field_name = [field.name for field in schema.fields]

[2023-05-07T10:24:35.312Z]         type_field_map = dict(zip(data_type, field_name))

[2023-05-07T10:24:35.312Z]         if is_binary:

[2023-05-07T10:24:35.312Z]             default_index_field = ct.default_binary_vec_field_name

[2023-05-07T10:24:35.312Z]             vector_index_type = "BIN_IVF_FLAT"

[2023-05-07T10:24:35.312Z]         else:

[2023-05-07T10:24:35.312Z]             default_index_field = ct.default_float_vec_field_name

[2023-05-07T10:24:35.312Z]             vector_index_type = "IVF_FLAT"

[2023-05-07T10:24:35.312Z]     

[2023-05-07T10:24:35.312Z]         binary_vector_index_types = [index.params["index_type"] for index in collection_w.indexes if

[2023-05-07T10:24:35.312Z]                                      index.field_name == type_field_map.get(100, "")]

[2023-05-07T10:24:35.312Z]         float_vector_index_types = [index.params["index_type"] for index in collection_w.indexes if

[2023-05-07T10:24:35.312Z]                                     index.field_name == type_field_map.get(101, "")]

[2023-05-07T10:24:35.312Z]         index_field_map = dict([(index.field_name, index.index_name) for index in collection_w.indexes])

[2023-05-07T10:24:35.312Z]         index_names = [index.index_name for index in collection_w.indexes]  # used to drop index

[2023-05-07T10:24:35.312Z]         vector_index_types = binary_vector_index_types + float_vector_index_types

[2023-05-07T10:24:35.312Z]         if len(vector_index_types) > 0:

[2023-05-07T10:24:35.312Z]             vector_index_type = vector_index_types[0]

[2023-05-07T10:24:35.312Z]         try:

[2023-05-07T10:24:35.312Z]             t0 = time.time()

[2023-05-07T10:24:35.312Z]             self.utility_wrap.wait_for_loading_complete(name)

[2023-05-07T10:24:35.312Z]             log.info(f"wait for {name} loading complete cost {time.time() - t0}")

[2023-05-07T10:24:35.312Z]         except Exception as e:

[2023-05-07T10:24:35.312Z]             log.error(e)

[2023-05-07T10:24:35.312Z]         # get replicas loaded

[2023-05-07T10:24:35.312Z]         try:

[2023-05-07T10:24:35.312Z]             replicas = collection_w.get_replicas(enable_traceback=False)

[2023-05-07T10:24:35.312Z]             replicas_loaded = len(replicas.groups)

[2023-05-07T10:24:35.312Z]         except Exception as e:

[2023-05-07T10:24:35.312Z]             log.error(e)

[2023-05-07T10:24:35.312Z]             replicas_loaded = 0

[2023-05-07T10:24:35.312Z]     

[2023-05-07T10:24:35.312Z]         log.info(f"collection {name} has {replicas_loaded} replicas")

[2023-05-07T10:24:35.312Z]         actual_replicas = re.search(r'replica_number_(.*?)_', name).group(1)

[2023-05-07T10:24:35.312Z]         assert replicas_loaded == int(actual_replicas)

[2023-05-07T10:24:35.312Z]         # params for search and query

[2023-05-07T10:24:35.312Z]         if is_binary:

[2023-05-07T10:24:35.312Z]             _, vectors_to_search = cf.gen_binary_vectors(

[2023-05-07T10:24:35.312Z]                 default_nb, default_dim)

[2023-05-07T10:24:35.312Z]             default_search_field = ct.default_binary_vec_field_name

[2023-05-07T10:24:35.312Z]         else:

[2023-05-07T10:24:35.312Z]             vectors_to_search = cf.gen_vectors(default_nb, default_dim)

[2023-05-07T10:24:35.312Z]             default_search_field = ct.default_float_vec_field_name

[2023-05-07T10:24:35.312Z]         search_params = gen_search_param(vector_index_type)[0]

[2023-05-07T10:24:35.312Z]     

[2023-05-07T10:24:35.312Z]         # load if not loaded

[2023-05-07T10:24:35.312Z]         if replicas_loaded == 0:

[2023-05-07T10:24:35.312Z]             # create index for vector if not exist before load

[2023-05-07T10:24:35.312Z]             is_vector_indexed = False

[2023-05-07T10:24:35.312Z]             index_infos = [index.to_dict() for index in collection_w.indexes]

[2023-05-07T10:24:35.312Z]             for index_info in index_infos:

[2023-05-07T10:24:35.312Z]                 if "metric_type" in index_info.keys():

[2023-05-07T10:24:35.312Z]                     is_vector_indexed = True

[2023-05-07T10:24:35.312Z]                     break

[2023-05-07T10:24:35.312Z]             if is_vector_indexed is False:

[2023-05-07T10:24:35.312Z]                 default_index_param = gen_index_param(vector_index_type)

[2023-05-07T10:24:35.312Z]                 self.create_index(collection_w, default_index_field, default_index_param)

[2023-05-07T10:24:35.312Z]             collection_w.load()

[2023-05-07T10:24:35.312Z]     

[2023-05-07T10:24:35.312Z]         # search and query

[2023-05-07T10:24:35.312Z]         if "empty" in name:

[2023-05-07T10:24:35.312Z]             # if the collection is empty, the search result should be empty, so no need to check

[2023-05-07T10:24:35.312Z]             check_task = None

[2023-05-07T10:24:35.312Z]         else:

[2023-05-07T10:24:35.312Z]             check_task = CheckTasks.check_search_results

[2023-05-07T10:24:35.312Z]     

[2023-05-07T10:24:35.312Z] >       collection_w.search(vectors_to_search[:default_nq], default_search_field,

[2023-05-07T10:24:35.312Z]                             search_params, default_limit,

[2023-05-07T10:24:35.312Z]                             default_search_exp,

[2023-05-07T10:24:35.312Z]                             output_fields=[ct.default_int64_field_name],

[2023-05-07T10:24:35.312Z]                             check_task=check_task,

[2023-05-07T10:24:35.312Z]                             check_items={"nq": default_nq,

[2023-05-07T10:24:35.312Z]                                          "limit": default_limit})

[2023-05-07T10:24:35.312Z] 

[2023-05-07T10:24:35.312Z] testcases/test_action_second_deployment.py:151: 

[2023-05-07T10:24:35.312Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2023-05-07T10:24:35.313Z] /usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py:666: in search

[2023-05-07T10:24:35.313Z]     res = conn.search(self._name, data, anns_field, param, limit, expr,

[2023-05-07T10:24:35.313Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:109: in handler

[2023-05-07T10:24:35.313Z]     raise e

[2023-05-07T10:24:35.313Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:105: in handler

[2023-05-07T10:24:35.313Z]     return func(*args, **kwargs)

[2023-05-07T10:24:35.313Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:136: in handler

[2023-05-07T10:24:35.313Z]     ret = func(self, *args, **kwargs)

[2023-05-07T10:24:35.313Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:85: in handler

[2023-05-07T10:24:35.313Z]     raise e

[2023-05-07T10:24:35.313Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:50: in handler

[2023-05-07T10:24:35.313Z]     return func(self, *args, **kwargs)

[2023-05-07T10:24:35.313Z] /usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py:522: in search

[2023-05-07T10:24:35.313Z]     return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)

[2023-05-07T10:24:35.313Z] /usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py:491: in _execute_search_requests

[2023-05-07T10:24:35.313Z]     raise pre_err

[2023-05-07T10:24:35.313Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2023-05-07T10:24:35.313Z] 

[2023-05-07T10:24:35.313Z] self = <pymilvus.client.grpc_handler.GrpcHandler object at 0x7f462c6bd400>

[2023-05-07T10:24:35.313Z] requests = [collection_name: "deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_...-1"

[2023-05-07T10:24:35.313Z] }

[2023-05-07T10:24:35.313Z] search_params {

[2023-05-07T10:24:35.313Z]   key: "offset"

[2023-05-07T10:24:35.313Z]   value: "0"

[2023-05-07T10:24:35.313Z] }

[2023-05-07T10:24:35.313Z] search_params {

[2023-05-07T10:24:35.313Z]   key: "ignore_growing"

[2023-05-07T10:24:35.313Z]   value: "False"

[2023-05-07T10:24:35.313Z] }

[2023-05-07T10:24:35.313Z] nq: 2

[2023-05-07T10:24:35.313Z] ]

[2023-05-07T10:24:35.313Z] timeout = None

[2023-05-07T10:24:35.313Z] kwargs = {'auto_id': False, 'check_items': {'limit': 10, 'nq': 2}, 'check_task': 'check_search_results', 'guarantee_timestamp': 0, ...}

[2023-05-07T10:24:35.313Z] auto_id = False

[2023-05-07T10:24:35.313Z] request = collection_name: "deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_i..."-1"

[2023-05-07T10:24:35.313Z] }

[2023-05-07T10:24:35.313Z] search_params {

[2023-05-07T10:24:35.313Z]   key: "offset"

[2023-05-07T10:24:35.313Z]   value: "0"

[2023-05-07T10:24:35.313Z] }

[2023-05-07T10:24:35.313Z] search_params {

[2023-05-07T10:24:35.313Z]   key: "ignore_growing"

[2023-05-07T10:24:35.313Z]   value: "False"

[2023-05-07T10:24:35.313Z] }

[2023-05-07T10:24:35.313Z] nq: 2

[2023-05-07T10:24:35.313Z] 

[2023-05-07T10:24:35.313Z] raws = []

[2023-05-07T10:24:35.313Z] response = status {

[2023-05-07T10:24:35.313Z]   error_code: UnexpectedError

[2023-05-07T10:24:35.313Z]   reason: "fail to search on all shard leaders, err=attempt #0: fail to get sha... err=NodeOffline(nodeID=1); NodeOffline(nodeID=9): context done during sleep after run#6: context deadline exceeded"

[2023-05-07T10:24:35.313Z] }

[2023-05-07T10:24:35.313Z] 

[2023-05-07T10:24:35.313Z] 

[2023-05-07T10:24:35.313Z]     def _execute_search_requests(self, requests, timeout=None, **kwargs):

[2023-05-07T10:24:35.313Z]         auto_id = kwargs.get("auto_id", True)

[2023-05-07T10:24:35.313Z]     

[2023-05-07T10:24:35.313Z]         try:

[2023-05-07T10:24:35.313Z]             if kwargs.get("_async", False):

[2023-05-07T10:24:35.313Z]                 futures = []

[2023-05-07T10:24:35.313Z]                 for request in requests:

[2023-05-07T10:24:35.313Z]                     ft = self._stub.Search.future(request, timeout=timeout)

[2023-05-07T10:24:35.313Z]                     futures.append(ft)

[2023-05-07T10:24:35.313Z]                 func = kwargs.get("_callback", None)

[2023-05-07T10:24:35.313Z]                 return ChunkedSearchFuture(futures, func, auto_id)

[2023-05-07T10:24:35.313Z]     

[2023-05-07T10:24:35.313Z]             raws = []

[2023-05-07T10:24:35.313Z]             for request in requests:

[2023-05-07T10:24:35.313Z]                 response = self._stub.Search(request, timeout=timeout)

[2023-05-07T10:24:35.313Z]     

[2023-05-07T10:24:35.313Z]                 if response.status.error_code != 0:

[2023-05-07T10:24:35.313Z] >                   raise MilvusException(response.status.error_code, response.status.reason)

[2023-05-07T10:24:35.313Z] E                   pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=attempt #0: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_100_441306787081073976v0 is not available in any replica, err=NodeOffline(nodeID=1); NodeOffline(nodeID=9): attempt #1: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_100_441306787081073976v0 is not available in any replica, err=NodeOffline(nodeID=9); NodeOffline(nodeID=1): attempt #2: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_100_441306787081073976v0 is not available in any replica, err=NodeOffline(nodeID=1); NodeOffline(nodeID=9): attempt #3: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_100_441306787081073976v0 is not available in any replica, err=NodeOffline(nodeID=9); NodeOffline(nodeID=1): attempt #4: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_100_441306787081073976v0 is not available in any replica, err=NodeOffline(nodeID=9); NodeOffline(nodeID=1): attempt #5: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_100_441306787081073976v0 is not available in any replica, err=NodeOffline(nodeID=1); NodeOffline(nodeID=9): attempt #6: fail to get shard leaders from QueryCoord: channel by-dev-rootcoord-dml_100_441306787081073976v0 is not available in any replica, err=NodeOffline(nodeID=1); NodeOffline(nodeID=9): context done during sleep after run#6: context deadline exceeded)>

[2023-05-07T10:24:35.313Z] 

[2023-05-07T10:24:35.313Z] /usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py:482: MilvusException

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_cron/detail/deploy_test_cron/739/pipeline

log: artifacts-pulsar-cluster-upgrade-739-server-logs.tar.gz artifacts-pulsar-cluster-upgrade-739-pytest-logs.tar.gz

Anything else?

No response

yanliang567 commented 1 year ago

/assign @jiaoew1991 /unassign

jiaoew1991 commented 1 year ago

/assign @sunby /unassign

sunby commented 1 year ago

Same reason as https://github.com/milvus-io/milvus/issues/23929

weiliu1031 commented 1 year ago

/assign @yah01 seems like watch DmChannel has broken, please fix it

weiliu1031 commented 1 year ago

/assign @yah01

yah01 commented 1 year ago

/assign @bigsheeper /unassign Caused by #24112

bigsheeper commented 1 year ago

/assign @zhuwenxing should be fixed, @zhuwenxing