milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.19k stars 2.89k forks source link

[Feature]: Split Compaction that can split a large segment into smaller one #35584

Open XuanYang-cn opened 2 months ago

XuanYang-cn commented 2 months ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Milvus can merge smaller segments into larger segment, but once a segment is flushed, it cannot be split into smaller segments.

This feature introducing a new compaction type: SplitCompaction. That segments can be splited into smaller one.

Auto trigger

  1. Input segments: size > maxSize * expansionRate
  2. Ouput segments: size ~= maxSize

Trigger

  1. timely
  2. manually

Describe the solution you'd like.

A compaction has n input segments, has m output segments.

  1. Previous compactions: n >= 1, m == 1
  2. This feature introduces: n == 1, m >= 1
  3. Eventually: n >= 1, m >= 1

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

xiaofan-luan commented 2 months ago

Is there an existing issue for this?

  • [x] I have searched the existing issues

Is your feature request related to a problem? Please describe.

Milvus can merge smaller segments into larger segment, but once a segment is flushed, it cannot be split into smaller segments.

This feature introducing a new compaction type: SplitCompaction. That segments can be splited into smaller one.

Auto trigger

  1. Input segments: size > maxSize * expansionRate
  2. Ouput segments: size ~= maxSize

Trigger

  1. timely
  2. manually

Describe the solution you'd like.

A compaction has n input segments, has m output segments.

  1. Previous compactions: n >= 1, m == 1
  2. This feature introduces: n == 1, m >= 1
  3. Eventually: n >= 1, m >= 1

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

I don't think we need a special compaction type? for any of the compaction, if output is > segment size we should split it automatically

XuanYang-cn commented 2 months ago

@xiaofan-luan Not designed to be a new type, just calling it split-compaction for a brand new trigger and avoid changing mix_compactor to influence all MixCompactions.

This's a decision compromised for:

  1. Non influence on previous MixCompaction
  2. reducing tones of tests work
  3. be able to fit into 2.4.x, especailly for those online instance that already have extra large segments

If given enough time, not aiming at 2.4.x, I'll just change it to the eventually goal. Making it just like an enhanced MixCompaction.

XuanYang-cn commented 2 months ago

This issue description is unclear, add some notes here:

xiaofan-luan commented 3 weeks ago

@XuanYang-cn

Did we test how much memory it will cost to compact on large segment like 100G?