Open ashkrisk opened 2 months ago
seems to be a valid requirement.
/assign @congqixia might be able to help on it
@xiaofan-luan @congqixia if no one has started looking into this, can I pick this up?
@xiaofan-luan @congqixia if no one has started looking into this, can I pick this up?
sure pls. This seems to be a difficult one, maybe we can setup a meeting between congqi and you to start
Sounds good, thanks! I've been going through the relevant code, would be great to have a meeting in 2-3 days.
/assign @ashkrisk
I am glad to help /assign @congqixia
I've taken some time to go through the Query code path in Milvus I have a decent idea about how things work. I think we can create a general interface for aggregation functions - in this case the distinct()
function, but later on could be extended to min()
, max()
etc
Here's a rough outline of the changes I plan to make:
In proxy
queryTask.PreExecute
currently checks the ouptut_fields
parameter of the query request and decides to create either a "count" plan or an ordinary retrieve plan. Now it will generate plans with aggregation functions as well - QueryPlanNode
and RetrieveRequest
will be modified to add a string field which stores the name of the aggregation function.
In querynode
@congqixia does this sound about right? I've emailed you to set up a meeting and discuss this further.
amazing, I though you get most of the idea. to implement this function, you need to think about:
Is there an existing issue for this?
Is your feature request related to a problem? Please describe.
We'd like to be able implement partition-key based multi-tenancy as suggested in this document: https://milvus.io/docs/multi_tenancy.md. One of the requirements is to keep track of the amount of rows utilized by any given tenant, for auditing purposes and to get an estimate of the amount of resources used per tenant.
One way to do this in Milvus is to run a
count(*)
query, with a filter on the partition key field:However, this method is only applicable if all the unique values in the partition key field are known beforehand (or stored in an external database).
It would be a significantly better user experience if there was a way to list the unique values of the partition key directly from Milvus itself, and avoid the need to be in sync with an external database.
Describe the solution you'd like.
There should be a Milvus API (or a modification to an existing API) that allows one to list the unique values of the partition key field. Even better would be do generalize the unique aggregation function so that it can be used with any field, not just the partition key.
Describe an alternate solution.
No response
Anything else? (Additional Context)
No response