livekit / agents

Build real-time multimodal AI applications 🤖🎙️📹
https://docs.livekit.io/agents
Apache License 2.0
4.04k stars 421 forks source link

added idle time in vad metrics #1064

Closed jayeshp19 closed 1 week ago

jayeshp19 commented 1 week ago

example

    @agent.on("metrics_collected")
    def _on_metrics_collected(mtrcs: metrics.AgentMetrics):
        if isinstance(mtrcs, metrics.PipelineVADMetrics):
            logger.info(f"VAD metrics: idle_time={mtrcs.idle_time:.2f}")
        metrics.log_metrics(mtrcs)
        usage_collector.collect(mtrcs)
2024-11-10 07:00:08,775 - DEBUG livekit.agents.pipeline - speech playout finished {"speech_id": "5e2f1364bd6d", "interrupted": false}
2024-11-10 07:00:08,775 - DEBUG livekit.agents.pipeline - committed agent speech {"agent_transcript": " Hey, how can I help you today?", "interrupted": false, "speech_id": "5e2f1364bd6d"}
2024-11-10 07:00:09,195 - INFO voice-assistant - VAD metrics: idle_time=3.06 
2024-11-10 07:00:10,215 - INFO voice-assistant - VAD metrics: idle_time=4.08 
2024-11-10 07:00:11,234 - INFO voice-assistant - VAD metrics: idle_time=5.10 
2024-11-10 07:00:12,265 - INFO voice-assistant - VAD metrics: idle_time=6.13 
2024-11-10 07:00:13,285 - INFO voice-assistant - VAD metrics: idle_time=7.15 
2024-11-10 07:00:14,314 - INFO voice-assistant - VAD metrics: idle_time=8.18 
2024-11-10 07:00:15,335 - INFO voice-assistant - VAD metrics: idle_time=0.48 
2024-11-10 07:00:15,685 - DEBUG livekit.agents.pipeline - received user transcript {"user_transcript": "Hi."}
2024-11-10 07:00:16,143 - DEBUG livekit.agents.pipeline - validated agent reply {"speech_id": "930dc55c54bc"}
2024-11-10 07:00:16,144 - INFO livekit.agents - Pipeline EOU metrics: sequence_id=930dc55c54bc, end_of_utterance_delay=0.91, transcription_delay=0.45
2024-11-10 07:00:16,354 - INFO voice-assistant - VAD metrics: idle_time=0.57 
2024-11-10 07:00:16,814 - INFO livekit.agents - Pipeline STT metrics: duration=10.68, audio_duration=2.10
2024-11-10 07:00:16,935 - INFO livekit.agents - Pipeline LLM metrics: sequence_id=930dc55c54bc, ttft=0.65, input_tokens=63, output_tokens=10, tokens_per_second=12.67
2024-11-10 07:00:17,384 - INFO voice-assistant - VAD metrics: idle_time=1.60
2024-11-10 07:00:18,403 - INFO voice-assistant - VAD metrics: idle_time=2.62
2024-11-10 07:00:18,535 - DEBUG livekit.agents.pipeline - speech playout started {"speech_id": "930dc55c54bc"}
2024-11-10 07:00:18,774 - INFO livekit.agents - Pipeline TTS metrics: sequence_id=930dc55c54bc, ttfb=1.5990837999997893, audio_duration=2.26
2024-11-10 07:00:19,436 - INFO voice-assistant - VAD metrics: idle_time=3.65
2024-11-10 07:00:20,454 - INFO voice-assistant - VAD metrics: idle_time=4.67
2024-11-10 07:00:20,796 - DEBUG livekit.agents.pipeline - speech playout finished {"speech_id": "930dc55c54bc", "interrupted": false}
2024-11-10 07:00:20,797 - DEBUG livekit.agents.pipeline - committed agent speech {"agent_transcript": " Hello! What can I help you with today?", "interrupted": false, "speech_id": "930dc55c54bc"}
2024-11-10 07:00:21,475 - INFO voice-assistant - VAD metrics: idle_time=5.69
2024-11-10 07:00:22,505 - INFO voice-assistant - VAD metrics: idle_time=6.72
2024-11-10 07:00:23,525 - INFO voice-assistant - VAD metrics: idle_time=7.74
2024-11-10 07:00:24,557 - INFO voice-assistant - VAD metrics: idle_time=8.77
2024-11-10 07:00:25,575 - INFO voice-assistant - VAD metrics: idle_time=9.79 
2024-11-10 07:00:26,596 - INFO voice-assistant - VAD metrics: idle_time=10.81 

PTAL @davidzhao @theomonnom

changeset-bot[bot] commented 1 week ago

🦋 Changeset detected

Latest commit: 3963a265edf8f117568b6ba539a066891500fb12

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package | Name | Type | | -------------- | ----- | | livekit-agents | Patch |

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

davidzhao commented 1 week ago

this gets emitted when the next non-silence is detected?

jayeshp19 commented 1 week ago

@davidzhao Basically the PipelineVADMetrics emits events every second with the VAD event type INFERENCE_DONE.

I've added a condition to reset the idle time whenever the VAD event type is START_OF_SPEECH or END_OF_SPEECH.

theomonnom commented 1 week ago

Is the intent of this metric to know when was the last user speech?

theomonnom commented 1 week ago

nit: use time.perf_counter instead of time.time()

jayeshp19 commented 1 week ago

@theomonnom Yes, that's correct! the metric is meant to track the time since the last user speech.

theomonnom commented 1 week ago

Nice lgtm!