michaelfeil / infinity

Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
https://michaelfeil.github.io/infinity/
MIT License
1.5k stars 116 forks source link

support for dimensions field like in OpenAI text-embedding-3, thanks #476

Open ericg108 opened 6 days ago

ericg108 commented 6 days ago

Feature request

when initializing with SentenceTransformers, we can use the truncate_dim argument, like below: model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1", truncate_dim=dimensions)

and in calling OpenAI text-embedding-3, we can also pass a `` argument to get variant-length embeddings

dimensions integer Optional The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models. see also: https://platform.openai.com/docs/api-reference/embeddings/create#embeddings-create-dimensions

Motivation

more and more embedding models are supporting Matryoshka embeddings, namely allowing users to get dimensions of varying length, like mxbai-embed-large-v1, jina-embeddings-v3 etc. this is very useful in scenarios with limited resources. hope it could be supported. Thanks.

Your contribution

I guess it's not a big modification. I may be able to add this feature when I'm told where to modify. Thanks.

michaelfeil commented 6 days ago

Hey, you are welcome to work on this. I have seen you never contributed, so here is a

Add a matryoshka_dim param:

This needs modifications in batch handler for embed,audio,video.

There is also engine + syncengine, which needs three times the integration for above. https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/infinity_emb/engine.py -> And a test for each of them in test_engine.py

-> Add as test in tests/end-to-end/testdummyengine.py (will only be based on numpy model) -> add test with openai client compatability. tests/end-to-end/test openai_compat.py

You also need to make the parameters / signature available to matryoska dim. https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/infinity_emb/sync_engine.py

Support in the pydantic model:

Should be a small code change (~50 LOC in 10 files), but its the first time adding a request time parameter and therefore needs extensive testing. I think its easy, but I would enjoy if more people contribute! Let me know if this is helpful to get started.

ericg108 commented 2 days ago

okay. Let me work on this. But I'm not familiar with git operations which may take me some time to file a merge request.

michaelfeil commented 2 days ago

It will make a lot of sense to first know the basics of git & the feature branch pattern, as well as being familiar with inheritance & unit testing in Python.