Open 3choff opened 2 weeks ago
Thank you for trying it out. Been trying to figure out ways to do that, can't quite seem to get it right. The only way I can think of is using some external service to detect the correct voice which is not something I want to do just yet. Let me know if you have any potential fixes or suggestions.
I fixed the self-interruption issue using WebRTC's echo cancellation in JavaScript. I could not make it work in Python. The chatbot I'm working on uses FastAPI and templates for the frontend, so I switched to JS libraries for recording and playing back TTS responses instead of Python. I used navigator.mediaDevices.getUserMedia with these settings:
const stream = await navigator.mediaDevices.getUserMedia({ audio: { echoCancellation: true, noiseSuppression: true, autoGainControl: true, channelCount: 1, sampleRate: 48000 } });
It seems to be working well.
Nice project! I’m having an issue where the VAD is picking up the TTS playback as if it’s user speech, so the chatbot ends up interrupting itself. I solved it by using earphones instead of the laptop speakers, but I wonder if there’s a way to add echo cancellation, voice fingerprint, or another mechanism to prevent this self-interruption.