Open puyuanOT opened 3 months ago
Hello, do you use local mode or server mode?
Could you show your collection info?
Hello, do you use local mode or server mode?
Could you show your collection info?
Thank you for your reply! It's running in a local mode. The for-loop adds elements to the sparse_embedding
collection, and the following is the client configuration.
{"collections": {"child_documents": {"vectors": {"dense_embeddings": {"size": 768, "distance": "Cosine", "hnsw_config": null, "quantization_config": null, "on_disk": null}}, "shard_number": null, "sharding_method": null, "replication_factor": null, "write_consistency_factor": null, "on_disk_payload": null, "hnsw_config": null, "wal_config": null, "optimizers_config": null, "init_from": null, "quantization_config": null, "sparse_vectors": {"sparse_embeddings": {"index": {"full_scan_threshold": null, "on_disk": null}}}}}, "aliases": {}}
How many vectors are you trying to insert?
I am trying to insert 20k batches, 64 embeddings per batch. The speed drops from 0.7 seconds per batch to 3 seconds per batch after 500 iterations.
This only happens for sparse embedding and doesn't happen to the dense.
Local mode actually has not been designed to handle millions of vectors, however it should handle 30k just fine.
I would recommend you to switch to qdrant server (e.g. in docker)
However, I would not expect it drop from 0.7 to 3 seconds per batch, we'll try to find the bottleneck, thanks for pointing it out :)
Thank you. I observe that the retrieval for sparse collection was also much (~20x) slower than dense in this case.
At the moment, sparse embeddings are in general slower than dense embeddings
It is especially noticeable in the local mode
It appears that the latency during upsert operations is due to resizing. This resizing process is initiated when we keep adding sparse vectors to a collection that contains both sparse and dense vectors, while the dense vectors is empty.
I am running a loop to insert sparse embeddings into an on-disk client. The loop becomes progressively slower—it starts taking less than 1 second at the beginning but slows down to approximately 5 seconds after a few thousand iterations. This issue only occurs with sparse embeddings and does not affect dense embeddings.