[Bug]: Incomplete bot transcript issue

pankajkumar commented 4 months ago

Brief Description

Hi Team,

I have recently upgraded to vocode latest version but I am facing a issue where bot transcriptions are incomplete, it only shows the first word of the whole sentence in the transcript. Human transcripts are seems to be fine at other end.

I am using deepgram as transcriber, gpt-4o for agent and elevenlabs websocket synthesizer for TTS.

for example:

BOT: Hello HUMAN: Hello sir I am xxx person from yyy company BOT: Hello xxx, you are talking from yyy, Can I know - HUMAN: Sir for the whole day hello hello BOT: yes HUMAN: Hello. Sir is there any requirements for cargo and courier services ? BOT: Right now HUMAN: Sorry? BOT: Any HUMAN: No sir, thank you. BOT: okay

And the actual conversation after transcribing the recording with deepgram using same model is given below, differences are highlighted in the bold. I have replaced the actual names with xxx, yyy and zzz.

BOT: Hello HUMAN: Hello sir I am xxx person from (yyy company) BOT: Hello (xxx person), you are talking from (yyy company), Can I know - HUMAN: Sir for the whole day hello hello BOT: yes, tell me xxx, what you want to talk about HUMAN: Hello. Sir is there any requirements for cargo and courier services ? BOT: Right now (zzz person) is busy, do you want to leave a message HUMAN: Sorry? BOT: Any message you want to give to (zzz person) HUMAN: No sir, thank you. BOT: okay, thank you

LLM

ChatGPT

Transcription Services

Deepgram

I have used hi-IN language model from deepgram nova-2 general model

Synthesis Services

Eleven Labs

Telephony Services

None

Conversation Type and Platform

No response

Steps to Reproduce

The issue is coming whenever there is interruption in the conversation by the caller, the bot transcript is showing incomplete.

Expected Behavior

The bot transcript should be complete.

Screenshots

c4922829-9461-4ebd-bc77-a9a80d92622b_r.zip

Please find the attached conversation recording for your reference.

pankajkumar commented 4 months ago

I have tried to debug this a bit further. I think its because of this code: https://github.com/vocodedev/vocode-core/blob/main/vocode/streaming/streaming_conversation.py#L494

maybe_synthesis_result is None if its not first token, hence only the first token is getting added into transcript.

ajar98 commented 4 months ago

thanks for the report! is this on latest master or on vocode 0.1.113?

pankajkumar commented 4 months ago

I am using vocode 0.1.113

pankajkumar commented 4 months ago

Found the root cause. This issue seems to be fixed in following commit: https://github.com/vocodedev/vocode-core/commit/60d21879d76e21a85bd25f9e607f3322eb3d3c45#diff-b086df98bf635ee6a71718985b64311e895fb6a750e1b83f633b6c39baed34c9L958

ajar98 commented 4 months ago

Awesome! Yes - the ElevenLabs websocket synthesizer had some problems with interruptions / transcripts which recently got fixed in the output device refactor. We're looking to push this to 0.1.114 soon, but the interim we have it in prerelease: https://pypi.org/project/vocode/0.1.114a0/

vocodedev / vocode-core