Closed ThreadDao closed 4 weeks ago
/assign @wayblink
please help on this
Two issues were discovered:
@xiaocai2333 @czs007 I think there is a conflict right now. When stats task triggered, the segment id changed. all the partition info is out dated (Because they record segment id info)
Based on the current situation, the state here should not be L2 but should be L1. It was discovered that an error occurred in resetting the segment level after the clustering compaction failed.
The reason is that the compaction task leaked, and since the compaction task contains a mapping from cluster key to buffer, this caused a large memory leak. I will fix it right away.
fixed master-20241023-1d61b604-amd64
, no dataNode oom and no clustering copaction loop
Is there an existing issue for this?
Environment
Current Behavior
deploy a milvus cluster
config: dataCoord: compaction: clustering: autoEnable: true enableActiveStandby: true indexCoord: enableActiveStandby: true log: level: debug queryCoord: enableActiveStandby: true queryNode: levelZeroForwardPolicy: RemoteLoad rootCoord: enableActiveStandby: true trace: exporter: jaeger jaeger: url: http://tempo-distributor.tempo:14268/api/traces sampleFraction: 1
{'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 64}, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}, {'name': 'int64_ck', 'description': '', 'type': <DataType.INT64: 5>, 'is_clustering_key': True}], 'enable_dynamic_field': False}
[2024-09-30 10:42:54,563 - ERROR - fouram]: (api_response) : [Collection.search] <_InactiveRpcError of RPC that terminated with: status = StatusCode.DEADLINE_EXCEEDED details = "Deadline Exceeded" debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-09-30T10:42:54.554845202+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
[2024-09-30 10:44:55,398 - ERROR - fouram]: (api_response) : [Collection.query] <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.DEADLINE_EXCEEDED details = "Deadline Exceeded" debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Deadline Exceeded", grpc_status:4, created_time:"2024-09-30T10:42:55.295962416+00:00"}"
Expected Behavior
No response
Steps To Reproduce
Milvus Log
pods:
Anything else?
No response