milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.44k stars 2.82k forks source link

[Feature]: Refine compaction related policy #24811

Open xiaofan-luan opened 1 year ago

xiaofan-luan commented 1 year ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Compaction is an important mechanism that Milvus relies on. Correct compaction can help reduce query latency, eliminate deleted data and expired data. Frequent compaction requires a lot of IO consumption, and it also needs a lot of resources to build indexes. Thinking about how to better perform compaction has become an important direction for Milvus optimization.

Describe the solution you'd like.

Here are some solutions about compactions I'm thinking of:

  1. Compaction read in multiple segments, and write out multiple segments based on partition keys. Name it as data aware compaction policy
  2. Trigger multiple compaction concurrency and better memory control on compaction
  3. Compaction trigger policy refine -> compaction by period, compaction by last delete timestamp, compaction by last expired entity timestamp
  4. Reduce unnecessary compactions. For example, when segment is small(Frequent Flushed), don't trigger compaction every 3 segments but 6 segments or more.
  5. Refine the compaction mechanism, make sure will not be stucked due to datanode/datacoord crash.
  6. Explore the possibility of minor compaction, where we only do compaction with no index build

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

xiaofan-luan commented 1 year ago

/assign @XuanYang-cn