milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.11k stars 2.95k forks source link

[Bug]: [Nightly] Sealed segment is unexpected after load balance #36727

Open NicoYuan1986 opened 1 month ago

NicoYuan1986 commented 1 month ago

Is there an existing issue for this?

Environment

- Milvus version: 02bd916
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Sealed segment is unexpected after load balance.

[pytest : test] _________________ TestUtilityAdvanced.test_load_balance_normal _________________
[pytest : test] [gw4] linux -- Python 3.8.17 /usr/local/bin/python3
[pytest : test] 
[pytest : test] self = <test_utility.TestUtilityAdvanced object at 0x7fca1fb55a00>
[pytest : test] 
[pytest : test]     @pytest.mark.tags(CaseLabel.L2)
[pytest : test]     def test_load_balance_normal(self):
[pytest : test]         """
[pytest : test]         target: test load balance of collection
[pytest : test]         method: init a collection and load balance
[pytest : test]         expected: sealed_segment_ids is subset of des_sealed_segment_ids
[pytest : test]         """
[pytest : test]         # init a collection
[pytest : test]         self._connect()
[pytest : test]         querynode_num = len(MilvusSys().query_nodes)
[pytest : test]         if querynode_num < 2:
[pytest : test]             pytest.skip("skip load balance testcase when querynode number less than 2")
[pytest : test]         c_name = cf.gen_unique_str(prefix)
[pytest : test]         collection_w = self.init_collection_wrap(name=c_name)
[pytest : test]         collection_w.create_index(default_field_name, default_index_params)
[pytest : test]         ms = MilvusSys()
[pytest : test]         nb = 3000
[pytest : test]         df = cf.gen_default_dataframe_data(nb)
[pytest : test]         collection_w.insert(df)
[pytest : test]         # get sealed segments
[pytest : test]         collection_w.num_entities
[pytest : test]         # get growing segments
[pytest : test]         collection_w.insert(df)
[pytest : test]         collection_w.load()
[pytest : test]         # prepare load balance params
[pytest : test]         time.sleep(0.5)
[pytest : test]         res, _ = self.utility_wrap.get_query_segment_info(c_name)
[pytest : test]         segment_distribution = cf.get_segment_distribution(res)
[pytest : test]         all_querynodes = [node["identifier"] for node in ms.query_nodes]
[pytest : test]         assert len(all_querynodes) > 1
[pytest : test]         all_querynodes = sorted(all_querynodes,
[pytest : test]                                 key=lambda x: len(segment_distribution[x]["sealed"])
[pytest : test]                                 if x in segment_distribution else 0, reverse=True)
[pytest : test]         src_node_id = all_querynodes[0]
[pytest : test]         des_node_ids = all_querynodes[1:]
[pytest : test]         sealed_segment_ids = segment_distribution[src_node_id]["sealed"]
[pytest : test]         # load balance
[pytest : test]         self.utility_wrap.load_balance(collection_w.name, src_node_id, des_node_ids, sealed_segment_ids)
[pytest : test]         # get segments distribution after load balance
[pytest : test]         time.sleep(0.5)
[pytest : test]         res, _ = self.utility_wrap.get_query_segment_info(c_name)
[pytest : test]         segment_distribution = cf.get_segment_distribution(res)
[pytest : test]         sealed_segment_ids_after_load_banalce = segment_distribution[src_node_id]["sealed"]
[pytest : test]         # assert src node has no sealed segments
[pytest : test] >       assert sealed_segment_ids_after_load_banalce == []
[pytest : test] E       assert [453117048979248010] == []
[pytest : test] E         Left contains one more item: 453117048979248010
[pytest : test] E         Full diff:
[pytest : test] E         - []
[pytest : test] E         + [453117048979248010]
[pytest : test] 

Expected Behavior

pass

Steps To Reproduce

No response

Milvus Log

  1. link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI(new)/detail/2.4/47/pipeline/121/
  2. log: https://grafana-4am.zilliz.cc/explore?panes=%7B%22J5Z%22:%7B%22datasource%22:%22vhI6Vw67k%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bpod%3D%5C%22mdk-24-47-py-n-milvus-querynode-778c47fb4d-mzc6v%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22vhI6Vw67k%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221728504137276%22,%22to%22:%221728511128277%22%7D%7D%7D&schemaVersion=1&orgId=1
  3. collection name: utility_lpLRK29D

Anything else?

No response

yanliang567 commented 1 month ago

/assign @weiliu1031 /unassign

yanliang567 commented 1 month ago

@NicoYuan1986 does this reproduce recently?

NicoYuan1986 commented 4 weeks ago

reproduce: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI(new)/detail/2.4/73/pipeline/120

[pytest : test] [gw2] [ 99%] FAILED testcases/test_utility.py::TestUtilityAdvanced::test_load_balance_normal

weiliu1031 commented 4 weeks ago

the test case is unstable, please refine the test case. cause the test case try to verify manual balance, but the background auto balance may take affect to the test result, which cause a unexpected result.

weiliu1031 commented 4 weeks ago

/assign @NicoYuan1986

NicoYuan1986 commented 4 weeks ago

I will modify the case soon.