pipecat-ai / pipecat

Open Source framework for voice and multimodal conversational AI
BSD 2-Clause "Simplified" License
3.16k stars 279 forks source link

Question: Storytelling example: Why toggle setLocalAudio() every turn? #180

Open TomTom101 opened 4 months ago

TomTom101 commented 4 months ago

First up, brilliant project! Still trying to wrap my head around the concept, but making some baby steps.

My question is: Why do we have to toggle recording when at the same time the "interruptible" feature is demoed in other examples?

if (e.data?.cue === "user_turn") {
  // Delay enabling local mic input to avoid feedback from LLM
  setTimeout(() => daily.setLocalAudio(true), 500);
  setStoryState("user");
} else {
  daily.setLocalAudio(false);
  setStoryState("assistant");
}

Is the "always listening" feature a unique one of the Daily default video call screen which does not apply to custom apps? Would it record the audio of the TTS output (which is a typical problem of any speech-to-speech systems I have written myself)?

Thanks for clarification, looking forward to v0.0.24 :)

aconchillo commented 4 months ago

I think this was just an example to show how you would use cues, but Storytelling could be adjusted to use interruptions.

chadbailey59 commented 4 months ago

This example is a bit different than the others, because it's very explicitly "turn-based." The way the app is designed is to ask you for story input at very specific times, and then generate several pages of "story" without being interrupted. We built an earlier version that left the mic unmuted all the time and just tried to ignore transcriptions when the user wasn't supposed to be talking, but that gets really complex; it turns out it's way easier just to mute the mic. :)

This example could definitely work with interruptions, but it would probably take some re-working of the prompts to tell the bot what to do when the user interrupts.

TomTom101 commented 4 months ago

but it would probably take some re-working of the prompts to tell the bot what to do when the user interrupts. Can you hint me towards an example which shows what needs to be done in the prompt to support it?