Experimental Feature In Progress: OpenAI OpenAPI Generation

As I've mentioned in discord, I've been prototyping around the idea of generating request and response models automatically from the "real" OpenAI API specification that they make available at this repo. The primary complication that's immediately apparent with this idea is that our local models and backends support additional configuration parameters. Therefore, I propose the following: that we change our API signatures for LocalAI to match OpenAI exactly, with one extension - our requests will have an additional optional parameter of x-LocalAI-extensions. This will be a structure specific to each endpoint that contains the applicable configuration points.

Some of my earlier experiments that I mentioned in discord involved generating only the relevant datastructures, and extending them to create our own models. This was appealing as it had the fewest runtime dependencies, but I was unsatisfied by the json-handling experience - and oapi-codegen generates parsing code if you allow it to!

Therefore, I tossed out that prototype and created a fresh one taking a cleaner approach: branch

The difference in this branch is that I've flipped the order of things: the custom LocalAI extension parameters are defined as a patch to the OpenAI specification before code generation so that our additional parameters are included. To avoid this becoming a maintenance nightmare, this is not a binary or text patch, but rather uses ytt to be a bit more aware of the YAML structure. I've got a starting point in the form of https://github.com/dave-gray101/LocalAI/blob/openai-openapi/openai-openapi/localai_model_patches.yaml, but this needs a lot of prettying up before this feature is done!

In addition to the parsing benefits, this also allows us to generate server stubs, to ensure we at least respond on all the endpoints, even if that response is a 501 Not Implemented.

I'm coming up for air here to start a discussion on this as there's really three ways I see to proceed from here:

I am very new to golang, and am not aware of tradeoffs between the different http server frameworks out there. Unfortunately, Fiber, the one used in this project, is not currently supported by oapi-codegen. If there's a particular reason we've selected this backend, it is possible to take the runtime dependency for improved parsing, without the server codegen - but I haven't investigated this route too much yet.
If we are flexible on http server, allowing a code generator to keep our endpoints in lockstep with OpenAI could have some benefits. If we do not want to change our underlying usage of OpenAIRequest within LocalAI itself, we could attempt to transform the endpoint-specific models to this structure at the api layer. Personally, I am not a big fan of this solution, but I want to list it for completeness - it's potentially less invasive than my preference below.
Personally, I don't even like the OpenAIRequest struct - I would rather have LocalAI's code handle seperate request models for the different endpoints, as they have such radically different parameters. We're well positioned to do this as the predictions.go file is a pretty good abstraction layer already. The main issue to discuss with this option surrounds the config files - currently, they have a pile of loose properties and an additional parameters object. I'd like to discuss some options here for potentially mode suitable structures. Due to the fact that models are sometimes used for multiple endpoints like embeddings or chat vs completion, I'm torn on if a better structure is to have a specific config file for each combination (something like config/chat/gpt-3.5-tubo.yaml being a common configuration to simulate chatGPT), or having a single config file per model, containing a mapping of supported endpoints to default request options. In either case, I propose removing most options from the "general" sections of the config to the aformentioned x-LocalAI-Extensions struct and therefore, the endpoint-specific default request options to use if the JSON on the request doesn't specify an override. The other main endpoint-specific detail that isn't a part of the request at all is the template to use but is there anything else in there worth specific consideration?

I plan to keep working on this in the direction of option 3 for now. Opening this up for discussion before I get too much farther! Sorry this post became a bit rambly - I just wanted to track it somewhere other than Discord.

mudler / LocalAI

Experimental Feature In Progress: OpenAI OpenAPI Generation #335