livekit / agents

Build real-time multimodal AI applications 🤖🎙️📹
https://docs.livekit.io/agents
Apache License 2.0
3.98k stars 408 forks source link

Modify context before LLM #1034

Open tpy37 opened 1 week ago

tpy37 commented 1 week ago

Thank you for the great package. I am looking for a method to change the context before sending it to the LLM in MultimodalAgent class. I think it exists in VoicePipelineAgent, and I am wondering how I could implement it with OpenAI realtime API.

tpy37 commented 1 week ago

I would be happy if you could implement the RAG part of the MultimodalAgent, as in the documentation! :) before_llm_cb=_enrich_with_rag

image

davidzhao commented 1 week ago

it's a bit difficult with multimodal agent, because it's from voice input directly to voice output.

the way to handle RAG is with function calling. If you are defining a function for the LLM to look up information with the user's query, it should be straight forward to pick up the function call and return the RAG results that way

tpy37 commented 1 week ago

Thank you very much David! I see... I was trying the function call, but changing to tool "required" seem to completely halt the process and break the conversation, so I stopped using it.

Since we have the transcribed text from the user's audio in the openai.realtime.RealtimeResponse

I was thinking that we could analyze the transcribed texts, and then use that to do RAG, and send the results async to the openai api as texts? const event = { type: 'conversation.item.create', item: { type: 'message', role: 'user', content: [ { type: 'input_text', text: 'Hello!' } ] } }; ws.send(JSON.stringify(event)); ws.send(JSON.stringify({type: 'response.create'})); https://platform.openai.com/docs/guides/realtime?text-generation-quickstart-example=text

Just some thoughts...

prashantmetadome commented 1 week ago

@tpy37 can you guide me on how to change context in VoicePipelineAgent?

tpy37 commented 1 week ago

I am sorry, I haven't done it myself in VoicePipeline. But the example is available in https://docs.livekit.io/agents/voice-agent/voice-pipeline/#modify-context-before-llm

I think there was also example to send it to RAG using this before_llm_cb in one of the Github repository: examples/voice-pipeline-agent/simple-rag/assistant.py

Hope it helps!

prashantmetadome commented 1 week ago

@tpy37 thank you very much, it does help. But I am still unsure.. my use case is I want to manipulate the prompt based on a tool call.

To go deeper in the specific requirement, the conversation has outgrown the current prompt and has gone into different territory that need to be handled by a different prompt.

I do not need to manipulate the prompt inside the tool call but if I can just extract some metadata from tool call and access it in the callback function should solve the problem but I am not sure how can I do that. @davidzhao any help would be appreciated.