michaelfeil / infinity

Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
https://michaelfeil.github.io/infinity/
MIT License
1.46k stars 112 forks source link

prepend v1 to OpenAI compatible APIs #371

Open samos123 opened 1 month ago

samos123 commented 1 month ago

Feature request

prepend v1 to OpenAI compatible APIs

Motivation

This allows us to integrate infinity the same way as other openai compatible API engines into KubeAI: https://github.com/substratusai/kubeai

PR: https://github.com/substratusai/kubeai/pull/197

Your contribution

Yeah I could probably do it once I get go ahead from you.

samos123 commented 1 month ago

Seems the response isn't exactly following the OpenAI response. This is from OpenAI docs: image

And this is what Infinity returns:

{
  "object": "embedding",
  "data": [
    {
      "object": "embedding",
      "embedding": [

The "object" should be "list" but infinity returns "embedding". Not sure if this matters though but sharing observation here.

samos123 commented 1 month ago

I confirmed that this is blocking integration with KubeAI:

INFO:     10.244.0.15:35798 - "POST /v1/embeddings HTTP/1.1" 404 Not Found
INFO:     10.244.0.15:35798 - "POST /v1/embeddings HTTP/1.1" 404 Not Found
INFO:     10.244.0.1:43268 - "GET /health HTTP/1.1" 200 OK
INFO:     10.244.0.15:35798 - "GET /metrics HTTP/1.1" 200 OK
INFO:     10.244.0.1:50798 - "GET /health HTTP/1.1" 200 OK
INFO:     10.244.0.1:50796 - "GET /health HTTP/1.1" 200 OK
INFO:     10.244.0.15:40274 - "GET /metrics HTTP/1.1" 200 OK
INFO:     10.244.0.15:40274 - "POST /v1/embeddings HTTP/1.1" 404 Not Found
samos123 commented 1 month ago

I may be able to make this work with url_prefix. Giving that a try.

On 2nd thought, I still think there should be 2 endpoitns by default for backwards compatability:

/embeddings
/v1/embeddings
benoitdion commented 3 weeks ago

The "object" should be "list" but infinity returns "embedding". Not sure if this matters though but sharing observation here

@samos123 this is indeed an issue. Using an openai compatible clients errors out since the response json is incompatible.

wirthual commented 2 weeks ago

Hi @benoitdion , Thanks for bringing this to our attention. This should be fixed on the main branch now. It would be great if you could let us know if the changes resolve your issues.

benoitdion commented 2 weeks ago

@wirthual looks like it will