pipecat-ai / pipecat

Open Source framework for voice and multimodal conversational AI
BSD 2-Clause "Simplified" License
3.47k stars 341 forks source link

introduce transport audio mixers #687

Closed aconchillo closed 3 weeks ago

aconchillo commented 3 weeks ago

Please describe the changes in your PR. If it is addressing an issue, please reference that as well.

This is a different implementation of background sounds which works better than https://github.com/pipecat-ai/pipecat/pull/682, actually it works, the other one doesn't fully work. Instead of a processor, we implement background sounds as a transport audio mixer. That is, every time we are about to play a chunk of audio we mix it if there's a mixer available, as simple as that.

The reason for this change is because it is hard to add a constant audio source reliably from a processor at the right speed without a hardware source. The other PR uses sleeps but that's not reliable at all. It might be possible to use presentation timestamps. With presentation timestamps we would queue the whole file in little chunks each with it's own timestamp. Then when the bot speaks we would also use presentation timestamps. I feel this would be much harder to implement and for people to get right. And while Pipecat is kind of a multimedia framework, it's final goal is conversational AI not being a fully featured multimedia framework, for that there are better options.

aconchillo commented 3 weeks ago

I think this is equivalent to https://github.com/pipecat-ai/pipecat/pull/611 but on the output transport.

aconchillo commented 3 weeks ago

Fixes https://github.com/pipecat-ai/pipecat/issues/456.