upsert is slow for sparse embeddings

qdrant / qdrant-client

Python client for Qdrant vector search engine

https://qdrant.tech

Apache License 2.0

692 stars 111 forks source link

upsert is slow for sparse embeddings #570

Open puyuanOT opened 3 months ago

puyuanOT commented 3 months ago

I am running a loop to insert sparse embeddings into an on-disk client. The loop becomes progressively slower—it starts taking less than 1 second at the beginning but slows down to approximately 5 seconds after a few thousand iterations. This issue only occurs with sparse embeddings and does not affect dense embeddings.

        for batch_ids, points in self._generate_rest_batches(
            texts, metadatas, ids, batch_size
        ):
            self.client.upsert(
                collection_name=self.collection_name, points=points, **kwargs
            )
            added_ids.extend(batch_ids)

joein commented 3 months ago

Hello, do you use local mode or server mode?

Could you show your collection info?

puyuanOT commented 3 months ago

Hello, do you use local mode or server mode?

Could you show your collection info?

Thank you for your reply! It's running in a local mode. The for-loop adds elements to the sparse_embedding collection, and the following is the client configuration.

{"collections": {"child_documents": {"vectors": {"dense_embeddings": {"size": 768, "distance": "Cosine", "hnsw_config": null, "quantization_config": null, "on_disk": null}}, "shard_number": null, "sharding_method": null, "replication_factor": null, "write_consistency_factor": null, "on_disk_payload": null, "hnsw_config": null, "wal_config": null, "optimizers_config": null, "init_from": null, "quantization_config": null, "sparse_vectors": {"sparse_embeddings": {"index": {"full_scan_threshold": null, "on_disk": null}}}}}, "aliases": {}}

joein commented 3 months ago

How many vectors are you trying to insert?

puyuanOT commented 3 months ago

I am trying to insert 20k batches, 64 embeddings per batch. The speed drops from 0.7 seconds per batch to 3 seconds per batch after 500 iterations.

This only happens for sparse embedding and doesn't happen to the dense.

joein commented 3 months ago

Local mode actually has not been designed to handle millions of vectors, however it should handle 30k just fine.

I would recommend you to switch to qdrant server (e.g. in docker)

However, I would not expect it drop from 0.7 to 3 seconds per batch, we'll try to find the bottleneck, thanks for pointing it out :)

puyuanOT commented 3 months ago

Thank you. I observe that the retrieval for sparse collection was also much (~20x) slower than dense in this case.

joein commented 3 months ago

At the moment, sparse embeddings are in general slower than dense embeddings

It is especially noticeable in the local mode

puyuanOT commented 3 months ago

It appears that the latency during upsert operations is due to resizing. This resizing process is initiated when we keep adding sparse vectors to a collection that contains both sparse and dense vectors, while the dense vectors is empty.

https://github.com/qdrant/qdrant-client/blob/8e3ea58f781e4110d11c0a6985b5e6bb66b85d33/qdrant_client/local/local_collection.py#L1136