Serve LARGE embedding models like E5-mistral-7b-instruct

huggingface / text-embeddings-inference

A blazing fast inference solution for text embeddings models

https://huggingface.co/docs/text-embeddings-inference/quick_tour

Apache License 2.0

2.73k stars 173 forks source link

Serve LARGE embedding models like E5-mistral-7b-instruct #195

Open ai-jz opened 7 months ago

ai-jz commented 7 months ago

Feature request

Support the recent larger embedding models of 7B or more parameters (20x larger than BERT-large)

Motivation

The embedding models are being much larger than before in the past few months. For example, Mistral-7B and Mixtrel-8x7B based embedding models are ranking on the top of the leaderboard:

https://huggingface.co/spaces/mteb/leaderboard

Do you plan to support such large embedding models (20x larger than BERT-large) via this repo or the TGI repo?

Your contribution

N/A

OlivierDehaene commented 7 months ago

What's your usecase for these models? Their throughput is so low and the costs so prohibitive that I don't see any.

functorism commented 6 months ago

Ignoring the fact that the quality delta between the top Mistral models the top BERTs might be insignificant in many cases; I see a lot of value in a text embedding inference server outside of it living in the rerank/search hot path. It can provide a ton of value simplifying pipelines that rely on embeddings for clustering.