Please describe the changes in your PR. If it is addressing an issue, please reference that as well.
This is a different implementation of background sounds which works better than https://github.com/pipecat-ai/pipecat/pull/682, actually it works, the other one doesn't fully work. Instead of a processor, we implement background sounds as a transport audio mixer. That is, every time we are about to play a chunk of audio we mix it if there's a mixer available, as simple as that.
The reason for this change is because it is hard to add a constant audio source reliably from a processor at the right speed without a hardware source. The other PR uses sleeps but that's not reliable at all. It might be possible to use presentation timestamps. With presentation timestamps we would queue the whole file in little chunks each with it's own timestamp. Then when the bot speaks we would also use presentation timestamps. I feel this would be much harder to implement and for people to get right. And while Pipecat is kind of a multimedia framework, it's final goal is conversational AI not being a fully featured multimedia framework, for that there are better options.
Please describe the changes in your PR. If it is addressing an issue, please reference that as well.
This is a different implementation of background sounds which works better than https://github.com/pipecat-ai/pipecat/pull/682, actually it works, the other one doesn't fully work. Instead of a processor, we implement background sounds as a transport audio mixer. That is, every time we are about to play a chunk of audio we mix it if there's a mixer available, as simple as that.
The reason for this change is because it is hard to add a constant audio source reliably from a processor at the right speed without a hardware source. The other PR uses
sleeps
but that's not reliable at all. It might be possible to use presentation timestamps. With presentation timestamps we would queue the whole file in little chunks each with it's own timestamp. Then when the bot speaks we would also use presentation timestamps. I feel this would be much harder to implement and for people to get right. And while Pipecat is kind of a multimedia framework, it's final goal is conversational AI not being a fully featured multimedia framework, for that there are better options.