michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
https://michaelfeil.eu/infinity/
MIT License
976 stars 72 forks source link

model name is not consistent across endpoints #178

Closed bufferoverflow closed 3 months ago

bufferoverflow commented 3 months ago

Feature request

Add a --served-model-name option to control the model name.

Motivation

I ran:

docker run -p 8080:8080 michaelf34/infinity:latest --model-name-or-path BAAI/bge-m3 --port 8080

Query the models endpoint:

$ curl -s http://0.0.0.0:8080/models | jq
{
  "data": [
    {
      "id": "BAAI/bge-m3",
      "stats": {
        "queue_fraction": 0,
        "queue_absolute": 0,
        "results_pending": 0,
        "batch_size": 32
      },
      "object": "model",
      "owned_by": "infinity",
      "created": 1711612054,
      "backend": "torch"
    }
  ],
  "object": "list"
}

Query the embeddeings endpoint:

$ curl -s -X 'POST'   'http://0.0.0.0:8080/embeddings'   -H 'accept: application/json'   -H 'Content-Type: application/json'   -d '{
  "input": [
    "string"
  ]}' | jq | grep model
  "model": "BAAIbge-m3",

Via embeddings endpoint the model is BAAIbge-m3 for the model endpoint it is BAAI/bge-m3, Somehow would be cool to control the name.

vLLM is doing this e.g. with the following options:

Your contribution

I can create a PR for this

michaelfeil commented 3 months ago

Sounds useful to me, would be great to PR it.

You can make it an EngineArg, since its closely coupled with the model. You might name it model-display-name, which defaults to None. Hoping someone PRs #13 so that might be better compatible then.