milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.38k stars 2.82k forks source link

[Feature]: Raise searching output limits (topk) #19007

Open mazitovs opened 2 years ago

mazitovs commented 2 years ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

I need to do approx search by 200m docs dataset, and 16k topk is very small limit for me

Describe the solution you'd like.

Raise topk 16k -> 65k as it was previously

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

soothing-rain commented 2 years ago

We set this limit to protect our system.

There's no plan to increase this limit very soon, but we are going to release range search feature in version 2.2, where you are able to search with an upper-bound and a lower-bound.

With range search, you could retrieve >16k data with multiple range search calls and combine the results together.

soothing-rain commented 2 years ago

/assign @mazitovs

MohGanji commented 2 months ago

Hi @soothing-rain , following up on this as we have a similar issue with Milvus,

Seems like this limit is still here in 2024, is it still not possible to raise this limit? What's the main concern about increasing this limit to larger (possibly 100k)?

If not, what are the alternatives? I saw the discussions about implementation of range search and I can find it in the docs for 2.3 and 2.2 but it seems to be removed in 2.4 I briefly looked at iterators (https://milvus.io/docs/with-iterators.md) which seems relevant, but I'm not sure if iterators are a replacement for range search or not.

We are using Milvus 2.4 at the moment, and we are storing embeddings of video frame data. With the 16k topk limit we are limited to querying only a small fraction of video frames which is not desirable. With this limit, we need to run up to x15 database queries instead of one with a higher limit. That will cause significant loss of performance.

If iterators are the way to go, how can we mitigate the performance concerns? If not, are there any alternative solutions?

Thanks in advance.