Closed zhuwenxing closed 4 months ago
/assign @XuanYang-cn PTAL
/assign @xiaocai2333
Flush failed for L0 segments added back into metacache, those L0 segment will never get flushed and block flush forever
fixed, please verify. /assign @zhuwenxing
/unassign
it is still reproduced in master-20240612-9d4535ce-amd64
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-cron/detail/chaos-test-kafka-cron/14838/pipeline log: artifacts-querynode-pod-kill-14838-server-logs.tar.gz
This is one of many failed tests.
/unassign
/assign @zhuwenxing please verify again.
image tag: master-20240619-7b9462c0-amd64
The current situation has seen some improvement, but the issue has not been fully resolved. Previously, almost all collections would fail to flush, and now some collections still fail to flush.
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/15373/pipeline log: artifacts-proxy-pod-failure-15373-server-logs.tar.gz
[2024-06-19T19:52:25.998Z] [2024-06-19 19:49:21 - DEBUG - ci_test]: (api_response) : <Collection>:
[2024-06-19T19:52:25.998Z] -------------
[2024-06-19T19:52:25.998Z] <name>: InsertChecker__ia1cSkY2
[2024-06-19T19:52:25.999Z] <description>:
[2024-06-19T19:52:25.999Z] <schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT...... (api_request.py:37)
[2024-06-19T19:52:25.999Z] [2024-06-19 19:49:21 - DEBUG - ci_test]: (api_request) : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)
[2024-06-19T19:52:25.999Z] [2024-06-19 19:52:21 - WARNING - pymilvus.decorators]: Retry timeout: 180s (decorators.py:105)
[2024-06-19T19:52:25.999Z] [2024-06-19 19:52:21 - ERROR - pymilvus.decorators]: RPC error: [flush], <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: InsertChecker__ia1cSkY2, flusht_ts: 450580070212829196)>, <Time:{'RPC start': '2024-06-19 19:49:21.823172', 'RPC error': '2024-06-19 19:52:21.943708'}> (decorators.py:139)
[2024-06-19T19:52:25.999Z] [2024-06-19 19:52:21 - ERROR - ci_test]: Traceback (most recent call last):
[2024-06-19T19:52:25.999Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 32, in inner_wrapper
[2024-06-19T19:52:25.999Z] res = func(*args, **_kwargs)
[2024-06-19T19:52:25.999Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 63, in api_request
[2024-06-19T19:52:25.999Z] return func(*arg, **kwargs)
[2024-06-19T19:52:25.999Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 319, in flush
[2024-06-19T19:52:25.999Z] conn.flush([self.name], timeout=timeout, **kwargs)
[2024-06-19T19:52:25.999Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 140, in handler
[2024-06-19T19:52:25.999Z] raise e from e
[2024-06-19T19:52:25.999Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
[2024-06-19T19:52:25.999Z] return func(*args, **kwargs)
[2024-06-19T19:52:25.999Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 175, in handler
[2024-06-19T19:52:25.999Z] return func(self, *args, **kwargs)
[2024-06-19T19:52:25.999Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 106, in handler
[2024-06-19T19:52:25.999Z] raise MilvusException(
[2024-06-19T19:52:25.999Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: InsertChecker__ia1cSkY2, flusht_ts: 450580070212829196)>
[2024-06-19T19:52:25.999Z] (api_request.py:45)
[2024-06-19T19:52:25.999Z] [2024-06-19 19:52:21 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: InsertChecker__ia1cSkY2, flusht_ts: 450580070212829196)> (api_request.py:46)
[2024-06-19T19:52:25.999Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------
[2024-06-19T19:52:25.999Z] =========================== short test summary info ============================
[2024-06-19T19:52:25.999Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[QueryChecker__CVwOkPk7] - AssertionError: Response of API flush expect True, but got False
[2024-06-19T19:52:25.999Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[InsertChecker__ia1cSkY2] - AssertionError: Response of API flush expect True, but got False
/unassign
@xiaocai2333 PTAL
If the segment has just been created and doesn't have a stats log yet, it won't be include in the SyncSegments view. Fix this immediately.
Not reproduced in master-20240625-506a9152-amd64
Is there an existing issue for this?
Environment
Current Behavior
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-cron/detail/chaos-test-kafka-cron/14576/pipeline log: artifacts-etcd-followers-pod-failure-14576-server-logs.tar.gz
Anything else?
almost all chaos. test for master image all failed due to flush timeout