livekit / agents

Build real-time multimodal AI applications 🤖🎙️📹
https://docs.livekit.io/agents
Apache License 2.0
1.04k stars 202 forks source link

Support for audit trails? #628

Open willsmanley opened 1 month ago

willsmanley commented 1 month ago

More of a question than an issue, but I wanted to see if you have a recommendation for creating audit trails or plans to support it more directly from a library interface.

I would like to have tools to record:

I am going to work on stitching all of this together and see how far I get, but I wanted to start a discussion here.

So far, unless you are using your own LLM plugin or forking an existing LLM plugin, it seems like you need to hook into the completion request within will_synthesize_assistant_reply. Capturing the response without forking seems trickier. And mp3, not sure about that yet.

willsmanley commented 1 month ago

@davidzhao fyi, thank you in advance

willsmanley commented 3 weeks ago

i'm making some limited progress here: 1) i realized that mp3 recordings ought to be managed by livekit/egress, duh. specifically as an autoegress upon room creation request. strangely, i could only get this to work with a raw twirp request rather than with an SDK, but that's ok. So far, I am autoegressing the user's track, but it's not publishing the agent's track. I will update here if I figure out how to do that. 2) i think LLM token usage is solved at a basic level from the PR linked above, but it could possibly be more of a first-class citizen of the API than this changeset suggests. That was just the minimal change required. 3) seems hacky to log completion requests via will_synthesize_assistant_reply and still haven't made a good solution for logging completion responses.

in general, have library authors looked at pipecat? they made some clearly different choices, but just wanted to bring this up in case there is anything to be learned from their implementation. i made a mutable.ai wiki for both repos since both are early in their development and documentation: https://wiki.mutable.ai/pipecat-ai/pipecat https://wiki.mutable.ai/livekit/agents

willsmanley commented 3 weeks ago

update: i figured out how to also egress the room composite in addition to the user's track, but still having issues with the agent-only track. created a separate issue here: https://github.com/livekit/agents/issues/656

willsmanley commented 3 weeks ago

created this PR for logging completion requests: https://github.com/livekit/agents/pull/658

keepingitneil commented 3 weeks ago

Yeah egress is a good way to record audio. For LLM completions you can use the VoiceAssistant user_speech_committed and agent_speech_committed events

https://github.com/livekit/agents/blob/main/livekit-agents/livekit/agents/voice_assistant/voice_assistant.py#L25

davidzhao commented 3 weeks ago

I think in general this is a great idea. we'd want to capture metrics on usage and report them.

willsmanley commented 3 weeks ago

The problem with user_speech_committed and agent_speech_committed is that they only emit the most recent message (and not any RAG output from will_synthesize_agent_response or tool_calls either). They could be extended to emit other context as well. I made this PR which emits everything I would be interested in (and I assume others who are doing LLMops / monitoring): https://github.com/livekit/agents/pull/658

And usage tokens PR here: https://github.com/livekit/agents/pull/614

I'm happy to change either one if you want to let me know what you'd like to see differently.

On my fork with these changes (along with the egress part), I have pretty much e2e monitoring for how fast tokens are being spent, who is spending them, and what all of the llm requests/responses are. It even works if you need to log ChatImages, but requires you to serialize these into JSON.