milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.1k stars 2.96k forks source link

[Bug]: Compact timeout(~3min) after reinstallation #35088

Open zhuwenxing opened 4 months ago

zhuwenxing commented 4 months ago

Is there an existing issue for this?

Environment

- Milvus version:2.4-20240729-d1632070-amd64
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2024-07-29T07:30:55.246Z] self = <test_all_collections_after_chaos.TestAllCollection object at 0x7f98a7c73040>

[2024-07-29T07:30:55.246Z] collection_name = 'IndexChecker__z573lcNj'

[2024-07-29T07:30:55.246Z] 

[2024-07-29T07:30:55.246Z]     @pytest.mark.tags(CaseLabel.L1)

[2024-07-29T07:30:55.246Z]     def test_milvus_default(self, collection_name):

[2024-07-29T07:30:55.246Z]         self._connect()

[2024-07-29T07:30:55.246Z]         # create

[2024-07-29T07:30:55.246Z]         name = collection_name if collection_name else cf.gen_unique_str("Checker_")

[2024-07-29T07:30:55.246Z]         t0 = time.time()

[2024-07-29T07:30:55.246Z]         schema = Collection(name=name).schema

[2024-07-29T07:30:55.246Z]         collection_w = self.init_collection_wrap(name=name, schema=schema)

[2024-07-29T07:30:55.246Z]         tt = time.time() - t0

[2024-07-29T07:30:55.246Z]         assert collection_w.name == name

[2024-07-29T07:30:55.246Z]         # get collection info

[2024-07-29T07:30:55.246Z]         schema = collection_w.schema

[2024-07-29T07:30:55.246Z]         dim = cf.get_dim_by_schema(schema=schema)

[2024-07-29T07:30:55.246Z]         int64_field_name = cf.get_int64_field_name(schema=schema)

[2024-07-29T07:30:55.246Z]         float_vector_field_name = cf.get_float_vec_field_name(schema=schema)

[2024-07-29T07:30:55.246Z]         float_vector_field_name_list = cf.get_float_vec_field_name_list(schema=schema)

[2024-07-29T07:30:55.246Z]         # compact collection before getting num_entities

[2024-07-29T07:30:55.246Z]         collection_w.flush(timeout=180)

[2024-07-29T07:30:55.246Z]         collection_w.compact()

[2024-07-29T07:30:55.246Z] >       collection_w.wait_for_compaction_completed(timeout=720)

[2024-07-29T07:30:55.246Z] 

[2024-07-29T07:30:55.246Z] testcases/test_all_collections_after_chaos.py:46: 

[2024-07-29T07:30:55.246Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2024-07-29T07:30:55.246Z] ../base/collection_wrapper.py:364: in wait_for_compaction_completed

[2024-07-29T07:30:55.246Z]     res = self.collection.wait_for_compaction_completed(timeout, **kwargs)

[2024-07-29T07:30:55.246Z] /usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py:1531: in wait_for_compaction_completed

[2024-07-29T07:30:55.246Z]     return conn.wait_for_compaction_completed(self.compaction_id, timeout=timeout, **kwargs)

[2024-07-29T07:30:55.246Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:143: in handler

[2024-07-29T07:30:55.246Z]     return func(*args, **kwargs)

[2024-07-29T07:30:55.246Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:182: in handler

[2024-07-29T07:30:55.246Z]     return func(self, *args, **kwargs)

[2024-07-29T07:30:55.246Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:87: in handler

[2024-07-29T07:30:55.246Z]     return func(*args, **kwargs)

[2024-07-29T07:30:55.246Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2024-07-29T07:30:55.246Z] 

[2024-07-29T07:30:55.246Z] self = <pymilvus.client.grpc_handler.GrpcHandler object at 0x7f98a7bf6520>

[2024-07-29T07:30:55.246Z] compaction_id = 451474099137983097, timeout = 720, kwargs = {}

[2024-07-29T07:30:55.246Z] start = 1722237901.1416473

[2024-07-29T07:30:55.246Z] compaction_state = 

[2024-07-29T07:30:55.246Z] CompactionState

[2024-07-29T07:30:55.246Z]  - compaction id: 451474099137983097

[2024-07-29T07:30:55.246Z]  - State: Executing

[2024-07-29T07:30:55.246Z]  - executing plan number: 1

[2024-07-29T07:30:55.246Z]  - timeout plan number: 0

[2024-07-29T07:30:55.246Z]  - complete plan number: 1

[2024-07-29T07:30:55.246Z] 

[2024-07-29T07:30:55.246Z] end = 1722238140.1978414

[2024-07-29T07:30:55.246Z] 

[2024-07-29T07:30:55.246Z]     @retry_on_rpc_failure()

[2024-07-29T07:30:55.246Z]     def wait_for_compaction_completed(

[2024-07-29T07:30:55.246Z]         self, compaction_id: int, timeout: Optional[float] = None, **kwargs

[2024-07-29T07:30:55.246Z]     ):

[2024-07-29T07:30:55.246Z]         start = time.time()

[2024-07-29T07:30:55.246Z]         while True:

[2024-07-29T07:30:55.246Z] >           time.sleep(0.5)

[2024-07-29T07:30:55.246Z] E           Failed: Timeout >240.0s

[2024-07-29T07:30:55.246Z] 

[2024-07-29T07:30:55.246Z] /usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py:1581: Failed

[2024-07-29T07:30:55.246Z] ------------------------------ Captured log setup ------------------------------

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - INFO - ci_test]: *********************************** setup *********************************** (client_base.py:44)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - INFO - ci_test]: [setup_method] Start setup test case test_milvus_default. (client_base.py:45)

[2024-07-29T07:30:55.246Z] ------------------------------ Captured log call -------------------------------

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_request)  : [Connections.connect] args: ['default', '', '', 'default', ''], kwargs: {'host': '10.255.229.117', 'port': 19530} (api_request.py:62)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_request)  : [Connections.has_connection] args: ['default'], kwargs: {} (api_request.py:62)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_response) : True  (api_request.py:37)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_request)  : [Collection] args: ['IndexChecker__z573lcNj', {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'varchar', 'description': '', 'type': <Dat......, kwargs: {'consistency_level': 'Strong'} (api_request.py:62)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_response) : <Collection>:

[2024-07-29T07:30:55.246Z] -------------

[2024-07-29T07:30:55.246Z] <name>: IndexChecker__z573lcNj

[2024-07-29T07:30:55.246Z] <description>: 

[2024-07-29T07:30:55.246Z] <schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT:......  (api_request.py:37)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_request)  : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_request)  : [Collection.compact] args: [180], kwargs: {} (api_request.py:62)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:01 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2024-07-29T07:30:55.246Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------

[2024-07-29T07:30:55.246Z] =========================== short test summary info ============================

[2024-07-29T07:30:55.246Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[QueryChecker__1Vz7YeZT] - Failed: Timeout >240.0s

[2024-07-29T07:30:55.246Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[Checker__15Ey8cY6] - Failed: Timeout >240.0s

[2024-07-29T07:30:55.246Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[Checker__KDS6BxHk] - Failed: Timeout >240.0s

[2024-07-29T07:30:55.246Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[Checker__QaaIvmLn] - Failed: Timeout >240.0s

[2024-07-29T07:30:55.246Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[Checker__c275zJyM] - Failed: Timeout >240.0s

[2024-07-29T07:30:55.246Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[Checker__ypx67r1N] - Failed: Timeout >240.0s

[2024-07-29T07:30:55.246Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[IndexChecker__z573lcNj] - Failed: Timeout >240.0s

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_for_release_cron/detail/deploy_test_kafka_for_release_cron/2693/pipeline

log: artifacts-kafka-standalone-reinstall-2693-server-second-deployment-logs.tar.gz

Anything else?

It's possible that it didn't fail, maybe it just took too long to execute.

zhuwenxing commented 4 months ago

/assign @XuanYang-cn

PTAL

XuanYang-cn commented 4 months ago

@zhuwenxing Yes, it finished at the end. The scheduler schedule it until 15:29, while the compaction task generated at 15:25.

There're 10+ compaction tasks waiting to be scheduled at 15:25.

image
xiaofan-luan commented 4 months ago

is this still a problem?

XuanYang-cn commented 3 weeks ago

We make some compaction enhancements sinc july, I believe this won't be a problem now. /unassign /assign @zhuwenxing Please help verify