I am trying to do text embedding with the pipeline. How to improve the speed with batch setting (or parallel the data collection mapping)?
The current pipe is much slower than feed the sbert encoding with list of text directly.
from towhee import AutoPipes
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
texts = ['this is a document', 'this is another document', 'this is a third document']*1000
# with sbert directly
embeddings = model.encode(texts, batch_size=128, show_progress_bar=True)
# with towhee pipe
from towhee import pipe, ops
text_embedding = (pipe.input('text')
.map('text', 'embedding', ops.sentence_embedding.transformers(model_name='all-MiniLM-L6-v2',))
.output('text', 'embedding')
)
res = text_embedding.batch(texts)
I am trying to do text embedding with the pipeline. How to improve the speed with batch setting (or parallel the data collection mapping)? The current pipe is much slower than feed the sbert encoding with list of text directly.