qdrant / vector-db-benchmark

Framework for benchmarking vector search engines
https://qdrant.tech/benchmarks/
Apache License 2.0
283 stars 82 forks source link

How to embedding with batches? #70

Closed DaiZack closed 9 months ago

DaiZack commented 1 year ago

I am trying to do text embedding with the pipeline. How to improve the speed with batch setting (or parallel the data collection mapping)? The current pipe is much slower than feed the sbert encoding with list of text directly.

image

from towhee import AutoPipes
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
texts = ['this is a document', 'this is another document', 'this is a third document']*1000

# with sbert directly
embeddings = model.encode(texts, batch_size=128, show_progress_bar=True)

# with towhee pipe
from towhee import pipe, ops
text_embedding = (pipe.input('text')
         .map('text', 'embedding', ops.sentence_embedding.transformers(model_name='all-MiniLM-L6-v2',))
         .output('text', 'embedding')
     )

res = text_embedding.batch(texts)
KShivendu commented 9 months ago

Hi @DaiZack, I think this question is related to Towhee more than Qdrant. It's better to ask in their community/repos :)