qdrant / qdrant

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
https://qdrant.tech
Apache License 2.0
20.13k stars 1.36k forks source link

Crash with "terminate called recursively" on creating more than 2000 collections #3564

Open st-cpai opened 8 months ago

st-cpai commented 8 months ago

Current Behavior

I'm trying to create more than 2000 collections, each has 1000 points of size 1536.

Qdrant crashes on creating the collection 2020. I can restart the container and it then can serve queries just fine, but trying to create another collection will crash it again with the same error:

2024-02-08T12:07:34.391362Z  INFO storage::content_manager::toc::collection_meta_ops: Creating collection col_2020
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called without an active exception
terminate called recursively
terminate called recursively

Steps to Reproduce

Please run this script to create the collections:

from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance,
    OptimizersConfigDiff,
    PointStruct,
    VectorParams,
)

EMBEDDING_SIZE = 1536
qdrant = QdrantClient("http://0.0.0.0:6333", timeout=60)

def create_cols(start_idx, count):
    for i in range(start_idx, start_idx + count):
        collection_name = f"col_{i+1}"
        print(f"Creating collection '{collection_name}':")
        qdrant.recreate_collection(
            collection_name=collection_name,
            vectors_config=VectorParams(size=EMBEDDING_SIZE, distance=Distance.EUCLID, on_disk=True),
            optimizers_config=OptimizersConfigDiff(
                indexing_threshold=0,
            ),
        )

        batch_size = 500
        for batch_start_idx in range(0, 1000, batch_size):
            embeddings = [[0.0] * EMBEDDING_SIZE] * batch_size
            points = [
                PointStruct(
                    id=idx + batch_start_idx,
                    vector=embedding,
                    payload={"name": f"point_{idx + batch_start_idx}", "age": 20 + idx % 10},
                )
                for idx, embedding in enumerate(embeddings)
            ]
            qdrant.upsert(
                collection_name=collection_name,
                points=points,
            )
            print(
                f"  - Upsert batch {len(points)} points from #{batch_start_idx} to collection '{collection_name}'"
            )

        # Turning indexing back on
        qdrant.update_collection(
            collection_name=collection_name, optimizer_config=OptimizersConfigDiff(indexing_threshold=20000)
        )

create_cols(0, 3000)

Context (Environment)

It has happened to me both on Mac M2, and EC2 ARM instances running Ubuntu.

timvisee commented 8 months ago

First of all, we strongly recommend against creating so many collections.

Instead we recommend to look at multitenancy. It describes how to put all your data in a single collection partitioned by payload. That will give you much better results.

Other than that, I've not seen this come up before. Thank you for reporting it! Because even though we don't recommend so many collections, this should not happen.

st-cpai commented 8 months ago

@timvisee thank you for your reply.

It may be hard to switch our collection design at the moment. Are there things we can try to overcome this limit?

timvisee commented 8 months ago

It may be hard to switch our collection design at the moment.

Could you elaborate why?

Are there things we can try to overcome this limit?

Basically, nothing, other than the suggestion above.

st-cpai commented 8 months ago

@timvisee

Could you elaborate why?

mostly because it will require a migration which we don't have the dev resource for at the moment.

What about going with a cluster deployment? Will that help?

generall commented 8 months ago

What about going with a cluster deployment? Will that help?

marginally. Still creating 2000 collections in the loop is a bad idea

st-cpai commented 8 months ago

What about going with a cluster deployment? Will that help?

marginally. Still creating 2000 collections in the loop is a bad idea

oh the loop is just a way to reproduce the issue. In our system we create the collections gradually over multiple weeks.

generall commented 8 months ago

oh the loop is just a way to reproduce the issue. In our system we create the collections gradually over multiple weeks.

that counts as well

RGafiyatullin commented 7 months ago

@st-cpai any chance, you're observing this behaviour when the service is about to run out of memory?

st-cpai commented 7 months ago

@st-cpai any chance, you're observing this behaviour when the service is about to run out of memory?

hi, no it was using just about 20% of memory