Open RichaMax opened 13 hours ago
Matryoshka embeddings is handled in another issue.
Due to the dynamic batching nature the models such as Jina need to handle the prompt template on instance level. However this is currently done on batch level (which might happen over multiple requests/tenants). Therefore this request is not possible.
Feature request
Models like https://huggingface.co/BAAI/bge-m3 and https://huggingface.co/jinaai/jina-embeddings-v3 can take extras kwargs as input of the
encode
function such astask=...
for Jina v3 orreturn_dense=False/True
for bge-m3It would be great if we could pass these kwargs either when using the async engine via the Python API
engine.embed(sentences=[...], additional_args=**kwargs)
or when we are sending requests to an endpoint create using your docker image
r = requests.post("http://0.0.0.0:7997/embeddings", json={"model":"test_model","input":["Two cute cats."], "task": "text-matching"})
Motivation
This would could also be used to handle
truncate_dim
for Matryoshka embeddings.might be linked to: #476
Your contribution
I could try to implement it on my free time but I do not have much currently plus I'm still navigating the code. Any pointers at where to start are welcome.