milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.01k stars 2.95k forks source link

[Feature]: Clustering optimization #28410

Open wayblink opened 1 year ago

wayblink commented 1 year ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Umbrella issue for clustering key optimization for milvus.

In the realm of database management, maximizing the efficiency of data storage and retrieval is of utmost importance. A clustering key stands out as a crucial element in database design, guiding the physical storage arrangement based on the distribution of data within a table. In conventional database systems, the usual data distribution revolves around the minimum and maximum values of scalar fields. However, in the case of a vector database, vectors take precedence as our primary entities. Consequently, in Milvus, we're committed to supporting both scalar clustering keys and vector clustering keys.

Key change: 1, Support designating a scalar or vector field as the clustering key for a collection. 2, Enabling bulk insert data with specific clustering information. Milvus will organize the data based on the provided clustering information. 3, Filtering out irrelevant data during searches based on clustering information. 4, Implementing a feature in Milvus to compact collections with a clustering key, leading to a rearrangement of storage.

Phase 1: Support bulk insert and query data with clustering info

Tasks:

Phase 2: Clustering based compaction

Dependency:

Tasks:

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

xiaofan-luan commented 1 year ago

/assign @wayblink