michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
https://michaelfeil.eu/infinity/
MIT License
977 stars 72 forks source link

Question: Support for sparse embeddings? #146

Open Matheus-Garbelini opened 3 months ago

Matheus-Garbelini commented 3 months ago

Hi, I was wondering whether is would make sence to support models which, in addition to dense vectors, also support sparse and colbert. For example, BGE-M3 works well under infinity for dense vector retrieval. However, it would require some changes to the inference process to additionally obtain sparse vectors such as shown here: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/BGE_M3/modeling.py#L352-L355

I wonder if for such case, it's feasible to add extra config parameters in the CLI or that would require too much changes to the core logic of the model during startup?

michaelfeil commented 3 months ago

The most staigtforward way to do this at this moment would be to:

michaelfeil commented 3 months ago

If you end up getting it done - I would love to feature it here! Also if you have further questions, let me know!

I personally think the results from BGE-m3 paper are a bit to hastly - the performance is not good enough for a paradigm change, its more of a experiment. Perhaps its time for a BGE-M3-V2

seetimee commented 2 weeks ago

same question now

Matheus-Garbelini commented 1 week ago

Hi @michaelfeil , sorry for the late reply. I actually ended up implementing a very basic and manual version of sparse embeddings for BGE-M3, but it is so slow and occupy so much GPU vram that I just switched to use the simple bm25 in elasticsearch for lexical search instead haha.