milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.14k stars 2.89k forks source link

[Bug]: [Nightly] Sealed segment is unexpected after load balance #36727

Open NicoYuan1986 opened 2 weeks ago

NicoYuan1986 commented 2 weeks ago

Is there an existing issue for this?

Environment

- Milvus version: 02bd916
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Sealed segment is unexpected after load balance.

[pytest : test] _________________ TestUtilityAdvanced.test_load_balance_normal _________________
[pytest : test] [gw4] linux -- Python 3.8.17 /usr/local/bin/python3
[pytest : test] 
[pytest : test] self = <test_utility.TestUtilityAdvanced object at 0x7fca1fb55a00>
[pytest : test] 
[pytest : test]     @pytest.mark.tags(CaseLabel.L2)
[pytest : test]     def test_load_balance_normal(self):
[pytest : test]         """
[pytest : test]         target: test load balance of collection
[pytest : test]         method: init a collection and load balance
[pytest : test]         expected: sealed_segment_ids is subset of des_sealed_segment_ids
[pytest : test]         """
[pytest : test]         # init a collection
[pytest : test]         self._connect()
[pytest : test]         querynode_num = len(MilvusSys().query_nodes)
[pytest : test]         if querynode_num < 2:
[pytest : test]             pytest.skip("skip load balance testcase when querynode number less than 2")
[pytest : test]         c_name = cf.gen_unique_str(prefix)
[pytest : test]         collection_w = self.init_collection_wrap(name=c_name)
[pytest : test]         collection_w.create_index(default_field_name, default_index_params)
[pytest : test]         ms = MilvusSys()
[pytest : test]         nb = 3000
[pytest : test]         df = cf.gen_default_dataframe_data(nb)
[pytest : test]         collection_w.insert(df)
[pytest : test]         # get sealed segments
[pytest : test]         collection_w.num_entities
[pytest : test]         # get growing segments
[pytest : test]         collection_w.insert(df)
[pytest : test]         collection_w.load()
[pytest : test]         # prepare load balance params
[pytest : test]         time.sleep(0.5)
[pytest : test]         res, _ = self.utility_wrap.get_query_segment_info(c_name)
[pytest : test]         segment_distribution = cf.get_segment_distribution(res)
[pytest : test]         all_querynodes = [node["identifier"] for node in ms.query_nodes]
[pytest : test]         assert len(all_querynodes) > 1
[pytest : test]         all_querynodes = sorted(all_querynodes,
[pytest : test]                                 key=lambda x: len(segment_distribution[x]["sealed"])
[pytest : test]                                 if x in segment_distribution else 0, reverse=True)
[pytest : test]         src_node_id = all_querynodes[0]
[pytest : test]         des_node_ids = all_querynodes[1:]
[pytest : test]         sealed_segment_ids = segment_distribution[src_node_id]["sealed"]
[pytest : test]         # load balance
[pytest : test]         self.utility_wrap.load_balance(collection_w.name, src_node_id, des_node_ids, sealed_segment_ids)
[pytest : test]         # get segments distribution after load balance
[pytest : test]         time.sleep(0.5)
[pytest : test]         res, _ = self.utility_wrap.get_query_segment_info(c_name)
[pytest : test]         segment_distribution = cf.get_segment_distribution(res)
[pytest : test]         sealed_segment_ids_after_load_banalce = segment_distribution[src_node_id]["sealed"]
[pytest : test]         # assert src node has no sealed segments
[pytest : test] >       assert sealed_segment_ids_after_load_banalce == []
[pytest : test] E       assert [453117048979248010] == []
[pytest : test] E         Left contains one more item: 453117048979248010
[pytest : test] E         Full diff:
[pytest : test] E         - []
[pytest : test] E         + [453117048979248010]
[pytest : test] 

Expected Behavior

pass

Steps To Reproduce

No response

Milvus Log

  1. link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI(new)/detail/2.4/47/pipeline/121/
  2. log: https://grafana-4am.zilliz.cc/explore?panes=%7B%22J5Z%22:%7B%22datasource%22:%22vhI6Vw67k%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bpod%3D%5C%22mdk-24-47-py-n-milvus-querynode-778c47fb4d-mzc6v%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22vhI6Vw67k%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221728504137276%22,%22to%22:%221728511128277%22%7D%7D%7D&schemaVersion=1&orgId=1
  3. collection name: utility_lpLRK29D

Anything else?

No response

yanliang567 commented 2 weeks ago

/assign @weiliu1031 /unassign