Support for SFR-Embedding Mistral

huggingface / text-embeddings-inference

A blazing fast inference solution for text embeddings models

https://huggingface.co/docs/text-embeddings-inference/quick_tour

Apache License 2.0

2.84k stars 177 forks source link

Support for SFR-Embedding Mistral #205

Closed prasannakrish97 closed 4 months ago

prasannakrish97 commented 8 months ago

Model description

It would have been awesome if TEI supports SFR-Embedding-Mistral, which figures on the top of the mteb : https://huggingface.co/Salesforce/SFR-Embedding-Mistral

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

https://huggingface.co/Salesforce/SFR-Embedding-Mistral

OlivierDehaene commented 8 months ago

What's your usecase for these models? Their throughput is so low and the costs so prohibitive that I don't see any.

prasannakrish97 commented 8 months ago

The usecases are :

Question Answering in FRENCH with multiple complex FINANCIAL docs with advanced RAG (llamaindex framework) with onprem embeddings & llm
Summarization of Documents
Summarization of support ticket from front office to back office
Generation of Question/answer pairs to enrich the knowledge basis of a an existing classic chatbot (made of question/answers written by experts)
Enhance an existing chatbot : improve/increase the matching rate/relevancy of the answers from a knowledge basis (made of question/answers written by experts)

Since the varieties of the following uses-cases, it seems that SFR-Embedding-Mistral, can be a single solution thanks to its versatility and ranked no.1 in huggingface mteb.

Let's suppose that TEI supports this model, the aim was to bench this model in CPU & GPU to have an idea of inference time and compare with other models.

OlivierDehaene commented 8 months ago

I can already tell you that inference times will be multiple orders of magnitude worse.

functorism commented 7 months ago

@OlivierDehaene The performance of a large model like Mistral is prohibitive for most embedding purposes (especially at scale). However, the quality of the embeddings makes it appealing for niche situations, especially in situations when performance isn't as crucial (for example, it's used in a pipeline doing clustering, rather than fast RAG search).

Having an inference server that supports it makes things easier. When I've used these Mistral embedding models I've used a quick and dirty fix in candle https://github.com/huggingface/candle/pull/1636 to get things going.

Besides not seeing value in this use case, is there anything else you view as problematic about supporting this model architecture?

theobjectivedad commented 7 months ago

+1 for this feature request. My use case requires high-quality embeddings for agent memories that are periodically generated. Very similar to section 4.2 from this paper.

TEI is a good choice for my research because is well-supported from a client standpoint. Moreover if I understand the performance tradeoff correctly it would be similar to normal inferencing and - as was mentioned - is not prohibitive for some use cases.

prasannakrish97 commented 7 months ago

I completely agree with you @theobjectivedad & @functorism : TEI is really a good choice to evaluate & to have an idea of the quality of the model / factor of performance to your specific use-case. It would be really awesome if TEI supports this model or even widely mistral family embeddings models.

Alternatively, I'm using infinity which supports very well SFR-Embedding-Mistral model :-) (cf. https://github.com/michaelfeil/infinity)

OlivierDehaene commented 7 months ago

OK I'm coming back to this issue and I will add it soon. @prasannakrish97 please don't mention other OSS projects here that's bad etiquette.

zhangdanfeng888 commented 4 months ago

@OlivierDehaene Hello, does TEI support SFR-Embedding Mistral（https://huggingface.co/Salesforce/SFR-Embedding-Mistral）now? I also have this demand.