milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.3k stars 2.91k forks source link

[Enhancement]: Add aggregation interface to Milvus #37441

Open ashkrisk opened 2 hours ago

ashkrisk commented 2 hours ago

Is there an existing issue for this?

What would you like to be added?

Add support for aggregations to Milvus, which can be extended to support arbitrary aggregations. For example, min, max, avg, etc.

Why is this needed?

Among other things, this improves the visibility of the data stored in Milvus and reduces the need to rely on an external database for certian kinds of common operations.

I see this as a useful feature in its own right, but also as a first step to implementing the feature in https://github.com/milvus-io/milvus/issues/34754.

Anything else?

There are a few questions that need to be addressed about the user experience. There are a couple of ways I can think of to expose an aggregation interface to Milvus users.

The first is to take an approach similar to the current count(*) interface:

collection.query(expr, output_fields=['sum(x)', 'max(y)'])

The second is to add an additional parameter for aggregations:

collection.query(expr, output_fields=['x', 'y'], aggregations={'sum': ['x'], 'max': ['y']})

I will raise a draft PR which assumes approach 1, but this can be changed based on feedback.

ashkrisk commented 2 hours ago

/assign ashkrisk /kind feature