huggingface / chat-ui

Open source codebase powering the HuggingChat app
https://huggingface.co/chat
Apache License 2.0
7.37k stars 1.07k forks source link

Add support for prompt augmentation via external API #909

Open oxaronick opened 7 months ago

oxaronick commented 7 months ago

Problem

I love HF chat-ui, and I'd like to deploy it for more teams. However, missing RAG features prevents me from deploying it as widely as I'd like.

Some teams need RAG with PDFs as a data source. Many of these PDFs are oddly formatted, and different types of PDFs require different kinds of parsing/chunking/embedding/whatever. This is not chat-ui's concern - it's mine - and no feature of any LLM chat UI will ever solve this problem in the way I need it solved.

Possible solution

What I'd really like is to have a feature where I can tell chat-ui to get a prompt from the user, call a ReST API to have the prompt translated/augmented/whatever, and then send the resulting prompt to the LLM. (Some way to hover and see what the actual, augmented prompt looked like would also be nice, in case something weird happens and the user wants to know why.)

I'll build the indexing system and present the ReST API to augment prompts, I just need a UI that will use it. I would even be happy to use an existing API as reference or adopt a standard if one exists, but I haven't seen one. Maybe we'll set the standard here.

My plan

I was thinking of forking chat-ui to add these hooks to do "prompt translation" or "prompt augmentation" or whatever the best name is.

Questions

Any advice on where I start? Where in the code would this logic go if I added it?

Are maintainers open to the idea of merging a feature like this if it works?

(Because you never know) Is this already present in chat-ui and I just haven't noticed?

nsarrazin commented 7 months ago

So the main difference with a RAG API (if I understand correctly) is that you want your system to return different things based on the content of the user prompt. (While RAG will just fetch a static asset, be it web page or PDF, even if it does sentence similarity afterwards).

Seems to me that this could be accomplished using some kind of function calling API? I think this feature would be a nice next-step for chat-ui, but we should discuss what it would look like.

In the meantime if you want to fork and add something a bit more custom built for your own use case, maybe have a look in:

src/lib/server/websearch/searchWeb.ts (which is the code we use to go from search query to list of URLs to parse, maybe you can use that to hook in to your indexing system)

Let me know if you need any other help!

oxaronick commented 7 months ago

So the main difference with a RAG API (if I understand correctly) is that you want your system to return different things based on the content of the user prompt. (While RAG will just fetch a static asset, be it web page or PDF, even if it does sentence similarity afterwards).

Yes, the other system would take the prompt and do a similarity search in a bunch of pre-indexed material (a library of sorts) and augment the prompt with the results of that search. That's a little different from some RAG flows (like chat-ui's web search or open-webui's PDF upload) where material is fetched and indexed as part of a conversation. But we're still augmenting the prompt with relevant results from a data store.

From chat-ui's perspective, though, it doesn't matter what I do in the library, which is good separation of concerns IMO. Someone else could implement what makes sense for them, as long as they follow the API spec.

I'll have a look at the web search code you mentioned and see what I can do. I'm sure I'll have questions. :)

oxaronick commented 6 months ago

I've managed to get the basic flow working, but I haven't surfaced anything in the UI yet.

I was picturing a "Library search" toggle at the bottom of the conversation next to the "Web search" toggle. If enabled by ENV vars, the toggle would be present and off by default.

If you turn it on, you should see the prompt you entered as you typed it, but maybe there were Updates in the UpdatePad about how the prompt was being augmented. Alternatively, maybe there was something at the bottom of the message showing the augmented prompt, similar to the WebSearch sources.

Thoughts, @nsarrazin ?

nsarrazin commented 6 months ago

Would a custom search engine work for you as part of the web search instead of having a different feature?

The way it works is currently as follows: (entry point here)

  1. We generate the query from the user conversation [generateQuery] webSearch.searchQuery = await generateQuery(messages);

  2. We pass the search query to a search provider which returns a list of relevant urls : const results = await searchWeb(webSearch.searchQuery);

  3. We fetch & parse the URLs from above (can be plain-text or HTML), chunk them, then do sentence similarity to augment the prompt with the top 8 chunks. (currently hardcoded but we could make it configurable)

Seems to me like for your use case, you could hook your feature in step 2, and replace the google search by a custom search engine which takes a search query and returns links to the relevant chunks (in plain text) hosted on your server. If you return 8 chunks or less they will all get added to the prompt.

Currently all search engines are all hardcoded, but it should be very easy to add your own there

secondtruth commented 6 months ago

Related:

oxaronick commented 6 months ago

Integrating it as another search engine would be easier, and have minimal impact on the UI. I'll try that route.

The only drawback I can think of is that it's not technically "web" search anymore... :D

oxaronick commented 6 months ago

I've got the basic flow working. The only thing that felt like Web Search wasn't a good fit was that I needed to skip generateQuery. The "You are tasked with generating web search queries..." doesn't make sense for this use case.

I will probably also look for a way to skip the chunking, since the remote server will be returning chunks rather than whole documents. Do you think some sort of size check makes sense here? Like, if the document is already under 200 characters don't bother chunking?

spew commented 3 months ago

Another thing that such a thing should support is forwarding the identity of the logged in chat user (if configured with OpenID).