milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.39k stars 2.91k forks source link

[Enhancement]: Add segment id hint in multi-stage operations #36482

Open congqixia opened 1 month ago

congqixia commented 1 month ago

Is there an existing issue for this?

What would you like to be added?

Add segment id hint in multi-stage operations like: "Requery", "DeleteByExpression", "L0Compaction", etc.

Why is this needed?

There are lots of duplicate execution in previous mentioned operations, especially to location segment id from input PK values. Some of them are proven to be the bottlenecks in some extreme use case.

Anything else?

No response

congqixia commented 1 month ago

Propose

Record pre-calculated segment id result from PK as segment hints and utilize them in following stages

Fallback mechanism

When compaction happens, the segment hints may be out-of-date so the detection & fallback mechanism is needed. The default fallback behavior is to act like the hint never exists, which is to iterate all candidates and re-calculate the segment ids.

Compatibility

The segment hints fields are newly added so that old components will just ignore the hint and act like fallback behavior. So there shall be no compatibility issue here.

congqixia commented 1 month ago

Delete By expression

the delete data route is relevant to multiple components: proxy, mq, datanode, querynode & storage.

Complex deletion

Delete by expression(aka complex delete) is separated into multiple steps:

The Action items here:

Utilizing segment hint processing delete

Datanode

There are two feature could be related to segment hints

QueryNode

The delegator need to forward streaming the streaming and l0 delta data into corresponding segments. Both task could utilize segment id hint as long as the delete data contains them.

stale[bot] commented 2 days ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.