livekit / agents

Build real-time multimodal AI applications 🤖🎙️📹
https://docs.livekit.io/agents
Apache License 2.0
4k stars 415 forks source link

User/Agent start/stop speaking? Voice interrupted ? #406

Closed ChrisFeldmeier closed 3 months ago

ChrisFeldmeier commented 4 months ago

How can I call this functions on the playground app? On the agent side it works very well, but I can not access at the agents client? What can I do, I need it urgent :-/ .. or what do I have to change get it done?

I want to fill the <AgentMultibandAudioVisualizer with the state for

type VisualizerState = "listening" | "idle" | "speaking" | "thinking";

- user_started_speaking: the user started speaking
- user_stopped_speaking: the user stopped speaking
- agent_started_speaking: the agent started speaking
- agent_stopped_speaking: the agent stopped speaking
- user_speech_committed: the user speech was committed to the chat context
- agent_speech_committed: the agent speech was committed to the chat context
- agent_speech_interrupted: the agent speech was interrupted
- function_calls_collected: received the complete set of functions to be executed
- function_calls_finished: all function calls have been completed

https://github.com/livekit/agents/blob/39a59595c870d8822fdbf4e271b352b0521a573a/livekit-agents/livekit/agents/voice_assistant/assistant.py#L90

ChrisFeldmeier commented 4 months ago

Does anyone have an idea?

theomonnom commented 3 months ago

Hey, for now you could use the publish_data API to forward these states to the client. see https://docs.livekit.io/realtime/client/data-messages/

ChrisFeldmeier commented 3 months ago

Okey, can you give me a short example? How can I send the user speaking and agent speaking events to the client from the agent? I understand the LocalParticipant.publishData methode on the client, but I think it's the server agent which should receive the event.. or the event should send from the agent to the client... hmm?

import asyncio import logging from livekit.agents import JobContext, JobRequest, WorkerOptions, cli, tokenize, tts from livekit.agents.llm import ( ChatContext, ChatMessage, ChatRole, ) from livekit.agents.voice_assistant import VoiceAssistant from livekit.plugins import deepgram, openai, silero, elevenlabs from dataclasses import dataclass

@dataclass class VoiceSettings: stability: float # [0.0 - 1.0] similarity_boost: float # [0.0 - 1.0] style: float | None = None # [0.0 - 1.0] use_speaker_boost: bool | None = False

@dataclass class Voice: id: str name: str category: str settings: VoiceSettings | None = None

async def entrypoint(ctx: JobContext): initial_ctx = ChatContext( messages=[ ChatMessage( role=ChatRole.SYSTEM, text="", ) ] )

openai_tts = tts.StreamAdapter(
    tts=openai.TTS(voice="shimmer", language="de"), #alloy -- für language auch tts abändern wie local in tts.py, alloy, echo, fable, onyx, nova, and shimmer
    sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
)

VOICE = Voice(
    id="9yD3PafDQ5YI0CMGS3cO", 
    name="", 
    category="custom",
    settings=VoiceSettings(
        stability=0.53, similarity_boost=0.71, style=0, use_speaker_boost=True
    ),
)

initPrompt = "Wie kann ich dir heute helfen?"

assistant = VoiceAssistant(
    vad=silero.VAD(),
    stt=deepgram.STT(language="de-DE"),
    llm=openai.LLM(model="gpt-4o"),
    tts=elevenlabs.TTS(voice=VOICE,language="de"), #openai_tts
    chat_ctx=initial_ctx,
)
assistant.start(ctx.room)

await asyncio.sleep(1)
await assistant.say(initPrompt, allow_interruptions=True)

async def request_fnc(req: JobRequest) -> None: logging.info("received request %s", req) await req.accept(entrypoint)

if name == "main": cli.run_app(WorkerOptions(request_fnc))