Batch support through Pub/Sub

samos123 commented 7 months ago

Ability for Lingo to directly listen and publish to a pub/sub topic

Messages should be using JSON

Request Message format:

{
  metadata: dict # required, used to store additional info about the request and returned in the response
  body: dict # required, openai API compatible HTTP body
  options: dict # optional
}

Output format:

{ 
  metadata: dict 
  response: {"status_code": 200, "body": '{"choices": [{"index": 0, "text": "completion of prompt"}], ..}'
}

The output message also contains the metadata of the request. This is needed such that end-users can easily join back the original request with the response they receive in another pubsub topic. Metadata is required because otherwise users risk sending requests but having no way to correlate the request and response.

MV error handling:

Consider all errors as equal and retry for max of x retries no matter what kind of error it is. It may be a waste to retry a malformed request 3 times, but we can optimize later, not a big deal for MVP imo.
The following behavior for any kind of errors:
- After x retries, send back a response using the same format {"metadata": dict, "response": dict}
- In case it's valid HTTP response with a body and a status code, include the status code and body in the response dict
- In case we were not able to get a valid HTTP response at all, set status_code to -1 and set the field "error" in the response dict where the text string is a helpful actionable text string for the end-user.

Things to consider:

min and max concurrent requests per lingo instance and being able to configure that to maximize utilization of resources prevent spikey scale ups and scale downs
Multi-region support by having lingo instances in multiple regions listen to the same pub/sub topic
How to handle different kind of errors with retries

nstogner commented 7 months ago

It might be feasible to keep the message body the exact same as an OpenAI http request. PubSub and other cloud offerings all have native support for message metadata.

GCP PubSub (see attributes field): https://cloud.google.com/pubsub/docs/reference/rest/v1/PubsubMessage
AWS SQS: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-message-metadata.html

samos123 commented 7 months ago

I prefer to keep it as generic and flexible as possible without relying on message metadata features of specific providers. It would limit our ability if we later do need to add something to the message. It would also limit our ability to add a new provider and support the same API if that provider has a slightly different or no implementation of metadata. I guess I don't see any benefits of using the native message metadata support.

It's also important to enforce the id parameter so people take a moment to add an ID field because otherwise it will be almost impossible for the end-users to know which request triggered the response.

I still think having a body field is preferred.

nstogner commented 7 months ago

I think .id [int] is not needed. The user can populate .metadata.* with their identifier(s). They might have a string id, they might have a composite id made up of multiple fields.

We do need a way of determining what HTTP path the user is trying to invoke. Generically this could be a .path field (ex: /v1/completions). Some considerations:

This does not always make sense for all endpoints (only makes sense for POST requests going to the completion, chat, or embeddings api). If we support other APIs that upload large binary data object we would likely need to pass the data by reference to a GCS bucket - thus changing the expectation of the .body field.
The user might not be aware of what path they are hitting - they might never think about it b/c they dont interact with the REST API directly)

This makes me consider having the user specify a .type field that would be an enum: completion, embedding, chat.

samos123 commented 7 months ago

+1 to letting the user decide if they want id as int in the metadata or not.

I would vote for a path field for now. I do think keeping body is important so we are not limiting ourselves to OpenAI API. We might want or need to support other apis in the body later.

nstogner commented 7 months ago

Next thing to consider is what to do about non-retry-able errors?

I think retry-able errors should be nack'ed (redelivered later). Note: PubSub can be configured with an expo-backoff redelivery strategy as to not clog the queue.

Some examples of non-retry-able errors:

Malformed requests - invalid json / missing fields
Non 200 status code from backend (especially 4XX's, most 5XX's are probably considered retry-able).
Model not-found (possibly retry-able - in a distributed environment there might be states in which backends for some models dont exist at a given point in time)

My thought for non-retry-able errors: send back an error response on the same responses topic with the error details, and allow the response consumer to filter and take action accordingly.

Other options:

Add an errors-only topic (more user overhead)
Log to STDERR (would not want to expect a user to need to sift through these logs)
Log these errors to a visible place (a bucket?)

samos123 commented 7 months ago

Right now what we do with an end-user is to send back the error as part of the response. In the end it's just a valid response imo that happens to have an error.

To keep the MVP simple, my suggestion would be either no retries (send back whatever the response is) or treat all errors equal:

Consider all errors as equal and retry for max of x retries no matter what kind of error it is. It may be a waste to retry a malformed request 3 times, but we can optimize later, not a big deal for MVP imo.
The following behavior for any kind of errors:
- After x retries, send back a response using the same format {"metadata": dict, "response": dict}
- In case it's valid HTTP response with a body and a status code, include the status code and body in the response dict
- In case we were not able to get a valid HTTP response at all, set status_code to -1 and set the field "error" in the response dict where the text string is a helpful actionable text string for the end-user.

nstogner commented 7 months ago

PR is currently in a place where there are no retries on failure - however the failures do propagate back on the response topic. We can either implement retries in this PR or follow up with another one.

Note: I have implemented the PR with an un-nested json structure for requests and responses:

# Request
{
  "path": "/v1/completions",
  "metadata": {"key": "val"},
  "body": {"model": "my-model", "prompt": "whats your favorite color?"}
}

# Response (on another topic)
{
  "status_code": 200,
  "metadata": {"key": "val"},
  "body": {"choices": [{"text": "My favorite color is blue"}]}
}

nstogner commented 7 months ago

Note the error structure follows the same convention right now: https://github.com/substratusai/lingo/pull/88/files#r1545874400

substratusai / kubeai

Batch support through Pub/Sub #86