milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.35k stars 2.82k forks source link

[Bug]: [major] The datanode is continously restarted for OOM after major compaction for 50M (768) dataset #34703

Open binbinlv opened 1 month ago

binbinlv commented 1 month ago

Is there an existing issue for this?

Environment

- Milvus version: 2.4-latest
- Deployment mode(standalone or cluster):both
- MQ type(rocksmq, pulsar or kafka):    all
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus-latest
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

The datanode is continously restarted for OOM after major compaction for 50M (768) dataset

Expected Behavior

No crash

Steps To Reproduce

1. create collection with partition key enabled (partition key as clustering key)
2. insert 50M (768)
3. major compaction

Milvus Log

No response

Anything else?

No response

xiaocai2333 commented 1 month ago

The failure to reset the flushedRowNum of the writerBuffer resulted in subsequent segments in this buffer continually meeting the pack conditions, leading to the creation of numerous segments. The datanode couldn't load the BloomFilters for so many segments, causing an OOM error.

binbinlv commented 1 month ago

The OOM issue still exists when major compaction on 50M (768) data: image: v2.4.6-nightly-176034d1-872