milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.32k stars 2.91k forks source link

[Feature]: Can hybrid_search support group by? #35777

Open wanglunhui2012 opened 2 months ago

wanglunhui2012 commented 2 months ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

I want to perform a hybrid_search on Milvus, but it does not support group by, I need to filter some repeat data. How can I do it?

Describe the solution you'd like.

the best way is adding this funtion.

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

xiaofan-luan commented 2 months ago

/assign @czs007

yiwen92 commented 2 months ago

I think this is duplicated with https://github.com/milvus-io/milvus/issues/35096

@wanglunhui2012 Where do you want to put you group_by field in? Option 1: At multiple recalls phase; or Option 2: at rerank phase. This will affect the result's Score(Opt 1 will return doc level scored by the top 1 chunk) or TopK count(Opt2 will rerank in chunk level but output by doc level). If you do not have an exactly TopK for doc number and only filter some repeat data, I recommend Option 2.