qdrant / qdrant-client

Python client for Qdrant vector search engine
https://qdrant.tech
Apache License 2.0
673 stars 106 forks source link

Qdrant Scroll Method Timeout #643

Closed Faiee50 closed 3 weeks ago

Faiee50 commented 3 weeks ago

I am encountering significant performance issues with the Qdrant scroll method. When setting a large limit, the method frequently results in a timeout error, making it impractical for large-scale data retrieval.

In contrast, other vector databases like Chroma and Pinecone offer more efficient methods for retrieving user data based on metadata, such as username. These methods are capable of fetching all relevant data without timing out.

Does Qdrant offer a similar function that can retrieve large datasets based on metadata without encountering timeout issues? While the scroll method seems to be designed for this purpose, it currently falls short due to its performance limitations.

This issue is critical for our project. Any suggestions or alternative approaches would be greatly appreciated.

Screenshot 1403-03-19 at 1 05 15 AM
Faiee50 commented 3 weeks ago

@timvisee Please look at it and give me solution for it.

generall commented 3 weeks ago

Hey @Faiee50, the solution for you would be to refer to the documentation https://qdrant.tech/documentation/concepts/points/#scroll-points

You are using very high value for the limit parameter, and it is absolutely no surprise that the endpoint gives you a timeout once you ask it to retrieve the whole dataset in one request.

You are welcome to use Pinecone if you find appropriate API there.

pseudo-usama commented 3 weeks ago

I'm also facing the same issue. But I didn't find any solutions. Is there any easier solutions to search based on metadata over large dataset? In that case the limit should be very large. I also use offset but the process was very slow.

joein commented 3 weeks ago

@pseudo-usama scroll method is not what we call search, it is just iterating over the records in Qdrant with some optional restrictions like payload filters. In order scroll to be fast, payload fields which are used in payload filters should have payload indices. Limit does not depend on the size of your dataset, it's about the amount of results you want to have in return. If you want to scroll over a huge amount of records, consider using pagination, which allows you to fetch the next records on demand instead of fetching all of them in one call.

Faiee50 commented 3 weeks ago

Hi @joein , Thank you for the clarification regarding the scroll method in Qdrant. I understand that it is used for iterating over records with optional restrictions, such as payload filters, and that for efficient scrolling, the payload fields should have payload indices.

I am interested in learning how to implement pagination using the scroll method in Qdrant. Could you provide an example code snippet demonstrating this?

joein commented 3 weeks ago

Hi @Faiee50 ,

scroll method returns a tuple of points and the next offset

you need to make calls to scroll with offset retrieved from the previous call (in the first call offset=None)

once returned offset value equals to None, it means that you've reached the end of the collection

Faiee50 commented 3 weeks ago

@joein , thanks for your quick help.This is what i am searching for a long :)

joein commented 3 weeks ago

you are welcome

p.s. it can be found here in the docs