feat: Assistant API - Githubissues

mudler commented 7 months ago

OpenAI released the Assistant API. https://platform.openai.com/docs/assistants/overview

This thread is to collect thoughts on how to integrate this - it seems we have already all the needed pieces, but it's needed to stich things together to expose a similar API.

I didn't had a closer look at the API yet - not on my radar until #1126 is completed

For reference, the OpenAI API spec: https://github.com/openai/openai-openapi

localai-bot commented 7 months ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

To integrate the OpenAI Assistant API with LocalAGI, we can use the local LLMs that are already loaded in memory to generate completions and edits. We can also use the existing chat interface to interact with the user and receive instructions for generating completions and edits. Here's a possible approach:

Add a new endpoint to LocalAGI to generate completions and edits using the local LLMs. This endpoint should take in the instruction, input text, and temperature parameters, and return the generated completion or edit as JSON response.
Modify the chat interface to include buttons for generating completions and edits, and send the appropriate request to the new endpoint when the user clicks on these buttons.
Add a new command-line argument to LocalAGI to specify the OpenAI Assistant API key, if needed. This will allow users to use their own OpenAI API key to access the Assistant API directly from LocalAGI.
Test and document the integration thoroughly to ensure it works as expected and is easy to use for users with different levels of technical expertise. Overall, this integration should provide a seamless experience for users who want to use LocalAGI to generate completions and edits using OpenAI's LLMs, while also allowing them to access the full Assistant API directly from LocalAGI if needed.

Sources:

muka commented 6 months ago

In the effort to contribute on this, I started collecting the requirements and functionalities offered by the Assistant API.

Assistant functionalities

[ ] Assistant API
[ ] Thread API
[ ] Run API
[x] File upload API
[ ] Tools support
[ ] RAG support (involving also a vector db, embeddings, feature extraction and search)
[ ] Code instruct support
[ ] Automatic truncation (or other approaches) of long context to fit in a Thread

Links

mudler commented 4 months ago

First PR in that direction adding File API: https://github.com/mudler/LocalAI/pull/1703

richiejp commented 4 months ago

I have a few ideas for vector search in order of my personal preference:

Embedded vector store and search
Host one of the many vector DBs in a backend
Connect to an external store

All could exist at once. I like the first one for the use-case where someone has a limited number of documents and/or search volume. Also for being the default choice in LocalAI without incurring a lot of maintenance or bloat. It should be relatively simple because:

Doing a brute force vector search in memory is reasonably fast for up to let's say 1M small vectors. Also it's an exact search whereas others are approximate. Both HNSW and brute force search implementations are included in the link
Embeddings and document chunks could be saved to flat file and loaded into memory when needed
Alternatively BadgerDB can be embedded which allows fast key iteration for comparing the vectors
- Even large 4096 dimension embeddings can be stored as keys and the values can be document segments.
- The keys can be prefixed with a file ID, so that only particular files are used in a query matching the OpenAI API
- Should handle the use-case where events are being streamed into the database in real-time

So I went ahead and started an experiment external to LocalAI: https://github.com/richiejp/badger-cybertron-vector/blob/main/main.go

It's probably really slow due to copying the keys, lack of parallel execution and such, but it works. I expect these things can be optimized. The question is whether to go with BadgerDB or a pure in memory implementation?

BTW Cybertron is pretty cool, that could be a new backend.

richiejp commented 3 months ago

Perhaps instead of, or in addition to what I have done with the basic vector search. We could have a higher level API which is backed by https://github.com/bclavie/ragatouille as a starting point.

The reason is that it seems colBERT v2 is far superior to basic cosine similarity search, but it is difficult to unpack it and get it to work with some arbitrary vector database. It's possibly the wrong level of abstraction for LocalAI to be working at even internally.

Opinions on implementing Ragatouille or colBERT (https://github.com/stanford-futuredata/ColBERT) as a backend?

richiejp commented 3 months ago

I'm mainly thinking of the indexing and retrieval Ragatuille APIs https://github.com/bclavie/ragatouille?tab=readme-ov-file#%EF%B8%8F-indexing

christ66 commented 3 months ago

+1 for Ragatouille

mudler / LocalAI

feat: Assistant API #1273

:warning::warning::warning::warning::warning:

:warning::warning::warning::warning::warning:

Assistant functionalities

Links