qdrant / qdrant-client

Python client for Qdrant vector search engine
https://qdrant.tech
Apache License 2.0
716 stars 115 forks source link

Possible extension of count functionality #593

Closed leonlowitzki closed 4 months ago

leonlowitzki commented 4 months ago

I tried to get a count for my frontend, when doing semantic search and recommending vectors.

I could set the limit to my collection's point count, but this is rather slow, as I currently have over 50k vectors.

Would it be possible to extend the count method to also include parameters for search and recommend?

From my limited understanding of the Rust code this is currently not supported in Qdrant?

I am happy for any suggestion :)

Example

Search

fast_and_accurate_but_much_ram = models.SearchParams(hnsw_ef=128, exact=False)
count = qdrant_client.count(
   collection_name=COLLECTION_NAME,
   search_params=fast_and_accurate_but_much_ram,
   query_vector=vector,
)

Recommend

count = qdrant_client.count(
  collection_name=COLLECTION_NAME,
  positive=positive_uuids,
  negative=negative_uuids,
  strategy=models.RecommendStrategy.BEST_SCORE,
)
joein commented 4 months ago

hi @leonlowitzki

for vector search all the points are similar to some extent, so basically if you set limit=number_of_points_in_your_collection, you'll get all the points you have

however, you can use count_filter to count only the number of points which have certain payload values

leonlowitzki commented 4 months ago

hi @joein thank you for the fast reply!

Like I mentioned I can't set limit=number_of_points_in_your_collection due to query performance of a large collection. When using count_filter I can only set types.Filter and not query_vector, positive or negative.

joein commented 4 months ago

hi @leonlowitzki

Could you please elaborate on the task you are trying to solve? What result do you expect from limiting count by query_vector / positive / negative ?

leonlowitzki commented 4 months ago

hi @joein I am trying to populate a table view in my frontend. At the the bottom of the table I need to display: Page X of Y. To calculate the page count Y, I need to know how many results are being returned. The users can either use semantic search or recommendations to get the results in their table view. Currently I am limiting the results to a fixed value 100, but it would be nice to return more results. Each individual table page is requested using limit and offset which works fine.

joein commented 4 months ago

Are you limiting your search results by a similarity threshold?

leonlowitzki commented 4 months ago

Currently not, but seems like a good idea to improve query performance. On the other hand I want to ensure that there is always something returned, even when the score is bad.

joein commented 4 months ago

If you are not limiting search with similarity score threshold, payload filters, or limit, then it will always return the whole collection, since all the points are similar to some extent.

If you are using limit - you already know the exact amount of results. If you are using search with payload filters, you can do count with the same payload filters. If you are using a similarity threshold - there is no way to count the number of results, since it is not known beforehand.

leonlowitzki commented 4 months ago

Thank you for the clarification and the fast responses!