milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.38k stars 2.91k forks source link

[Enhancement]: Remove all bf from datanode #34585

Closed bigsheeper closed 1 week ago

bigsheeper commented 4 months ago

Is there an existing issue for this?

What would you like to be added?

Currently, datanode keeps a full in-memory bloom filter for all segments. This is mainly to check if a delete message hits and to reduce delete data.

I suggest removing the bf from datanode for these reasons:

  1. Maintaining bf only optimizes in two cases:
    • a) User deletes lots of non-existent primary keys
    • b) User uses upsert as insert. We can rely on L0 compaction for the two cases. With batched bf, L0 compaction is fast enough to handle these cases (this can be tested).
  2. Maintaining a full bf in datanode is error-prone(see: #34186), and any omissions could result in losing delete data(see: #34565).
  3. Removing bf will speed up delete processing in flowgraph and reduce datanode's complexity.

Why is this needed?

No response

Anything else?

No response

stale[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

bigsheeper commented 1 week ago

done