substratusai / lingo

Lightweight ML model proxy and autoscaler for kubernetes
https://www.substratus.ai
Apache License 2.0
96 stars 6 forks source link

Batch support through Pub/Sub #86

Closed samos123 closed 3 months ago

samos123 commented 3 months ago

Ability for Lingo to directly listen and publish to a pub/sub topic

Messages should be using JSON

Request Message format:

{
  metadata: dict # required, used to store additional info about the request and returned in the response
  body: dict # required, openai API compatible HTTP body
  options: dict # optional
}

Output format:

{ 
  metadata: dict 
  response: {"status_code": 200, "body": '{"choices": [{"index": 0, "text": "completion of prompt"}], ..}'
}

The output message also contains the metadata of the request. This is needed such that end-users can easily join back the original request with the response they receive in another pubsub topic. Metadata is required because otherwise users risk sending requests but having no way to correlate the request and response.

MV error handling:

Things to consider:

nstogner commented 3 months ago

It might be feasible to keep the message body the exact same as an OpenAI http request. PubSub and other cloud offerings all have native support for message metadata.

samos123 commented 3 months ago

I prefer to keep it as generic and flexible as possible without relying on message metadata features of specific providers. It would limit our ability if we later do need to add something to the message. It would also limit our ability to add a new provider and support the same API if that provider has a slightly different or no implementation of metadata. I guess I don't see any benefits of using the native message metadata support.

It's also important to enforce the id parameter so people take a moment to add an ID field because otherwise it will be almost impossible for the end-users to know which request triggered the response.

I still think having a body field is preferred.

nstogner commented 3 months ago

I think .id [int] is not needed. The user can populate .metadata.* with their identifier(s). They might have a string id, they might have a composite id made up of multiple fields.

We do need a way of determining what HTTP path the user is trying to invoke. Generically this could be a .path field (ex: /v1/completions). Some considerations:

This makes me consider having the user specify a .type field that would be an enum: completion, embedding, chat.

samos123 commented 3 months ago

+1 to letting the user decide if they want id as int in the metadata or not.

I would vote for a path field for now. I do think keeping body is important so we are not limiting ourselves to OpenAI API. We might want or need to support other apis in the body later.

nstogner commented 3 months ago

Next thing to consider is what to do about non-retry-able errors?

I think retry-able errors should be nack'ed (redelivered later). Note: PubSub can be configured with an expo-backoff redelivery strategy as to not clog the queue.

Some examples of non-retry-able errors:

My thought for non-retry-able errors: send back an error response on the same responses topic with the error details, and allow the response consumer to filter and take action accordingly.

Other options:

samos123 commented 3 months ago

Right now what we do with an end-user is to send back the error as part of the response. In the end it's just a valid response imo that happens to have an error.

To keep the MVP simple, my suggestion would be either no retries (send back whatever the response is) or treat all errors equal:

nstogner commented 3 months ago

PR is currently in a place where there are no retries on failure - however the failures do propagate back on the response topic. We can either implement retries in this PR or follow up with another one.

Note: I have implemented the PR with an un-nested json structure for requests and responses:

# Request
{
  "path": "/v1/completions",
  "metadata": {"key": "val"},
  "body": {"model": "my-model", "prompt": "whats your favorite color?"}
}
# Response (on another topic)
{
  "status_code": 200,
  "metadata": {"key": "val"},
  "body": {"choices": [{"text": "My favorite color is blue"}]}
}
nstogner commented 3 months ago

Note the error structure follows the same convention right now: https://github.com/substratusai/lingo/pull/88/files#r1545874400