voxos-ai / bolna

End-to-end platform for building voice first multimodal agents
MIT License
379 stars 110 forks source link

Getting stuck on loop "Last transmitted timestamp is simply 0 and hence continuing" when calling /call API #183

Closed saksham003 closed 3 months ago

saksham003 commented 4 months ago
bolna-app-1   | future: <Task finished name='Task-33' coro=<write_request_logs() done, defined at /usr/local/lib/python3.10/site-packages/bolna/helpers/utils.py:414> exception=TypeError("object of type 'NoneType' has no len()") created at /usr/local/lib/python3.10/asyncio/tasks.py:337>
bolna-app-1   | source_traceback: Object created at (most recent call last):
bolna-app-1   |   File "/usr/local/bin/uvicorn", line 8, in <module>
bolna-app-1   |     sys.exit(main())
bolna-app-1   |   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
bolna-app-1   |     return self.main(*args, **kwargs)
bolna-app-1   |   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
bolna-app-1   |     rv = self.invoke(ctx)
bolna-app-1   |   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
bolna-app-1   |     return ctx.invoke(self.callback, **ctx.params)
bolna-app-1   |   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
bolna-app-1   |     return __callback(*args, **kwargs)
bolna-app-1   |   File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 410, in main
bolna-app-1   |     run(
bolna-app-1   |   File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 578, in run
bolna-app-1   |     server.run()
bolna-app-1   |   File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 61, in run
bolna-app-1   |     return asyncio.run(self.serve(sockets=sockets))
bolna-app-1   |   File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
bolna-app-1   |     return loop.run_until_complete(main)
bolna-app-1   |   File "/usr/local/lib/python3.10/site-packages/bolna/agent_manager/task_manager.py", line 1341, in __first_message
bolna-app-1   |     await self._synthesize(create_ws_data_packet(self.kwargs.get('agent_welcome_message', None), meta_info= meta_info))
bolna-app-1   |   File "/usr/local/lib/python3.10/site-packages/bolna/agent_manager/task_manager.py", line 1172, in _synthesize
bolna-app-1   |     self.__convert_to_request_log(message = text, meta_info= meta_info, component="synthesizer", direction="response", model = self.synthesizer_provider, is_cached= True, engine=self.tools['synthesizer'].get_engine())
bolna-app-1   |   File "/usr/local/lib/python3.10/site-packages/bolna/agent_manager/task_manager.py", line 626, in __convert_to_request_log
bolna-app-1   |     asyncio.create_task(write_request_logs(log, self.run_id))
bolna-app-1   |   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 337, in create_task
bolna-app-1   |     task = loop.create_task(coro)
bolna-app-1   | Traceback (most recent call last):
bolna-app-1   |   File "/usr/local/lib/python3.10/site-packages/bolna/helpers/utils.py", line 423, in write_request_logs
bolna-app-1   |     component_details = [message['data'], None, None,len(message['data']), message['cached'], None, message['engine']]
bolna-app-1   | TypeError: object of type 'NoneType' has no len()
bolna-app-1   | 2024-05-19 07:00:31.360 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
bolna-app-1   | 2024-05-19 07:00:33.361 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
bolna-app-1   | 2024-05-19 07:00:35.362 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
bolna-app-1   | 2024-05-19 07:00:37.363 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
bolna-app-1   | 2024-05-19 07:00:38.771 INFO {telephony} [_listen] call stopping
bolna-app-1   | 2024-05-19 07:00:39.363 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
bolna-app-1   | 2024-05-19 07:00:39.364 ERROR {deepgram_transcriber} [transcribe] Error in transcribe:
bolna-app-1   | 2024-05-19 07:00:41.362 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
Gracefully stopping... (press Ctrl+C again to force)
prateeksachan commented 4 months ago

@saksham003 thanks for reporting this. Can you also paste the agent config you tried? Also, did you take the latest pull?

saksham003 commented 4 months ago

@prateeksachan yes I did pull the latest changes. Here's the agent config I used -

{
  "agent_config": {
    "agent_name": "test",
    "agent_type": "bolna_test",
    "agent_welcome_message": "Hi, my name is Rachel.",
    "tasks": [
      {
        "task_type": "conversation",
        "call_cancellation_prompt": null,
        "hangup_after_LLMCall": false,
        "hangup_after_silence": 10,
        "incremental_delay": 100,
        "interruption_backoff_period": 100,
        "number_of_words_for_interruption": 3,
        "optimize_latency": true,
        "task_config": {
          "hangup_after_silence": 10
        },
        "tools_config": {
          "llm_agent": {
            "max_tokens": 100,
            "family": "openai",
            "model": "gpt-3.5-turbo",
            "agent_flow_type": "streaming",
            "use_fallback": true,
            "temperature": 0.2,
            "base_url": null
          },
          "synthesizer": {
            "audio_format": "wav",
            "buffer_size": 100,
            "provider": "elevenlabs",
            "provider_config": {
                "voice": "Rachel",
                "model": "eleven_multilingual_v2",
                "voice_id": "21m00Tcm4TlvDq8ikWAM"
            },
            "stream": true
          },
          "transcriber": {
            "model": "deepgram",
            "stream": true,
            "language": "en",
            "endpointing": 400,
            "keywords": ""
          },
          "input": {
            "provider": "twilio",
            "format": "wav"
          },
          "output": {
            "provider": "twilio",
            "format": "wav"
          }
        },
        "toolchain": {
          "execution": "parallel",
          "pipelines": [["transcriber", "llm", "synthesizer"]]
        }
      }
    ]
  },
  "agent_prompts": {
    "task_1": {
      "system_prompt": "You will always be very concise and brief with your responses. You will NEVER speak longer than 2 sentences. \nYou are Rachel, an order representative for Pizza Square. You will be very helpful and kind. Customers will call you and give their order. You will follow this script. You will ONLY ask 1 question at a time. Once you get an answer you will move on to the next one. \n 1) You will start your conversation by saying. \"Hi! I am Rachel from Pizza Square. Am I speaking to Michael from 345 Palm Square?\" All your responses should be as long as this one.) \n 2) You will ask the customer what they would like to order. Only ask the next question once you have a clear answer to this question. If the customer wants assistance, you will ask them about their preferences. \n 3) You will ask the customer for the required size - regular, medium, or large. If the customer has follow up questions, you will clarify, that the small is for 1 person and 4 dollars, the medium is for 2 people and 7 dollars and the large is for 4 people and 11 dollars. Only ask the next question once you have a clear answer to this question. \n 4) Once confirmed, you will ask whether the customer wants to pay the amount of for the chosen pizza via card or cash. \n 5) You will then tell the customer that their pizza is in the oven and will reach within 30 minutes. \n You have options of 3 pizzas. You will only give description if the user asks for it.  \n 1) The BBQ Chicken - Which has Chicken, Mushrooms, Olives in a tomato base 2) The Pesto Dominator - Which has Chicken, Bacon and Ham in a green base. This is the best-selling pizza. 3) The Veg Ultimate - Which has tofu, mushrooms, olives, tomatoes in a tomato base. "
    }
  }
}
prateeksachan commented 4 months ago

@saksham003 can you make sure the docker image you're building is recreating from the latest master? The logs you've posted seem to be from an earlier time. Can you delete the local docker image and try again? docker images -a | grep "bolna"

In the meantime, I'll add a check to re-create the images in case of changes via docker.

prateeksachan commented 4 months ago

@saksham003 have just pushed changes in 97e9daa20b31efcbb7f0f89f7b79139e618640a6 related to docker commands. Can you please check and let us know.

https://github.com/bolna-ai/bolna/blob/58cbfb907c697046cfdce5e013f9dc450717807d/README.md?plain=1#L55-L57

saksham003 commented 4 months ago

@prateeksachan, I tried this but I'm still getting the same issue. Here are all the logs -

2024-05-19 14:43:34 2024-05-19 09:13:34.663 INFO {quickstart_server} [websocket_endpoint] Connected to ws
2024-05-19 14:43:34 INFO:     ('172.18.0.2', 54736) - "WebSocket /chat/v1/b8803ad0-c6e6-4ad9-8a86-e882bed6c5d5" [accepted]
2024-05-19 14:43:34 INFO:     connection open
2024-05-19 14:43:34 2024-05-19 09:13:34.665 INFO {quickstart_server} [websocket_endpoint] Retrieved agent config: {"agent_name": "bolna_test", "agent_type": "test", "tasks": [{"tools_config": {"llm_agent": {"model": "gpt-3.5-turbo", "max_tokens": 100, "agent_flow_type": "streaming", "family": "openai", "temperature": 0.2, "request_json": false, "stop": null, "top_k": 0, "top_p": 0.9, "min_p": 0.1, "frequency_penalty": 0.0, "presence_penalty": 0.0, "provider": "openai", "base_url": null, "routes": null, "extraction_details": null, "summarization_details": null}, "synthesizer": {"provider": "elevenlabs", "provider_config": {"voice": "Rachel", "voice_id": "21m00Tcm4TlvDq8ikWAM", "model": "eleven_multilingual_v2", "temperature": 0.5, "similarity_boost": 0.5}, "stream": true, "buffer_size": 100, "audio_format": "wav", "caching": true}, "transcriber": {"model": "deepgram", "language": "en", "stream": true, "sampling_rate": 16000, "encoding": "linear16", "endpointing": 400, "keywords": ""}, "input": {"provider": "twilio", "format": "wav"}, "output": {"provider": "twilio", "format": "wav"}, "api_tools": null}, "toolchain": {"execution": "parallel", "pipelines": [["transcriber", "llm", "synthesizer"]]}, "task_type": "conversation", "task_config": {"optimize_latency": true, "hangup_after_silence": 10, "incremental_delay": 100, "number_of_words_for_interruption": 1, "interruption_backoff_period": 100, "hangup_after_LLMCall": false, "call_cancellation_prompt": null, "backchanneling": false, "backchanneling_message_gap": 5, "backchanneling_start_delay": 5}}], "assistant_status": "seeding"}
2024-05-19 14:43:34 2024-05-19 09:13:34.666 INFO {assistant_manager} [run] Running task 0 {'tools_config': {'llm_agent': {'model': 'gpt-3.5-turbo', 'max_tokens': 100, 'agent_flow_type': 'streaming', 'family': 'openai', 'temperature': 0.2, 'request_json': False, 'stop': None, 'top_k': 0, 'top_p': 0.9, 'min_p': 0.1, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'provider': 'openai', 'base_url': None, 'routes': None, 'extraction_details': None, 'summarization_details': None}, 'synthesizer': {'provider': 'elevenlabs', 'provider_config': {'voice': 'Rachel', 'voice_id': '21m00Tcm4TlvDq8ikWAM', 'model': 'eleven_multilingual_v2', 'temperature': 0.5, 'similarity_boost': 0.5}, 'stream': True, 'buffer_size': 100, 'audio_format': 'wav', 'caching': True}, 'transcriber': {'model': 'deepgram', 'language': 'en', 'stream': True, 'sampling_rate': 16000, 'encoding': 'linear16', 'endpointing': 400, 'keywords': ''}, 'input': {'provider': 'twilio', 'format': 'wav'}, 'output': {'provider': 'twilio', 'format': 'wav'}, 'api_tools': None}, 'toolchain': {'execution': 'parallel', 'pipelines': [['transcriber', 'llm', 'synthesizer']]}, 'task_type': 'conversation', 'task_config': {'optimize_latency': True, 'hangup_after_silence': 10, 'incremental_delay': 100, 'number_of_words_for_interruption': 1, 'interruption_backoff_period': 100, 'hangup_after_LLMCall': False, 'call_cancellation_prompt': None, 'backchanneling': False, 'backchanneling_message_gap': 5, 'backchanneling_start_delay': 5}} and sending kwargs {}
2024-05-19 14:43:34 2024-05-19 09:13:34.666 INFO {task_manager} [__init__] doing task {'tools_config': {'llm_agent': {'model': 'gpt-3.5-turbo', 'max_tokens': 100, 'agent_flow_type': 'streaming', 'family': 'openai', 'temperature': 0.2, 'request_json': False, 'stop': None, 'top_k': 0, 'top_p': 0.9, 'min_p': 0.1, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'provider': 'openai', 'base_url': None, 'routes': None, 'extraction_details': None, 'summarization_details': None}, 'synthesizer': {'provider': 'elevenlabs', 'provider_config': {'voice': 'Rachel', 'voice_id': '21m00Tcm4TlvDq8ikWAM', 'model': 'eleven_multilingual_v2', 'temperature': 0.5, 'similarity_boost': 0.5}, 'stream': True, 'buffer_size': 100, 'audio_format': 'wav', 'caching': True}, 'transcriber': {'model': 'deepgram', 'language': 'en', 'stream': True, 'sampling_rate': 16000, 'encoding': 'linear16', 'endpointing': 400, 'keywords': ''}, 'input': {'provider': 'twilio', 'format': 'wav'}, 'output': {'provider': 'twilio', 'format': 'wav'}, 'api_tools': None}, 'toolchain': {'execution': 'parallel', 'pipelines': [['transcriber', 'llm', 'synthesizer']]}, 'task_type': 'conversation', 'task_config': {'optimize_latency': True, 'hangup_after_silence': 10, 'incremental_delay': 100, 'number_of_words_for_interruption': 1, 'interruption_backoff_period': 100, 'hangup_after_LLMCall': False, 'call_cancellation_prompt': None, 'backchanneling': False, 'backchanneling_message_gap': 5, 'backchanneling_start_delay': 5}}
2024-05-19 14:43:34 2024-05-19 09:13:34.666 INFO {task_manager} [__init__] Connected via websocket
2024-05-19 14:43:34 2024-05-19 09:13:34.666 INFO {task_manager} [__setup_input_handlers] Connected through dashboard None
2024-05-19 14:43:34 2024-05-19 09:13:34.666 INFO {task_manager} [__setup_output_handlers] Making sure that the sampling rate for output handler is 8000
2024-05-19 14:43:34 2024-05-19 09:13:34.667 INFO {task_manager} [__init__] History []
2024-05-19 14:43:34 2024-05-19 09:13:34.667 INFO {task_manager} [__init__] TASK CONFIG {'llm_agent': {'model': 'gpt-3.5-turbo', 'max_tokens': 100, 'agent_flow_type': 'streaming', 'family': 'openai', 'temperature': 0.2, 'request_json': False, 'stop': None, 'top_k': 0, 'top_p': 0.9, 'min_p': 0.1, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'provider': 'openai', 'base_url': None, 'routes': None, 'extraction_details': None, 'summarization_details': None}, 'synthesizer': {'provider': 'elevenlabs', 'provider_config': {'voice': 'Rachel', 'voice_id': '21m00Tcm4TlvDq8ikWAM', 'model': 'eleven_multilingual_v2', 'temperature': 0.5, 'similarity_boost': 0.5, 'sampling_rate': 8000}, 'stream': True, 'buffer_size': 100, 'audio_format': 'pcm', 'caching': True}, 'transcriber': {'model': 'deepgram', 'language': 'en', 'stream': True, 'sampling_rate': 16000, 'encoding': 'linear16', 'endpointing': 400, 'keywords': ''}, 'input': {'provider': 'twilio', 'format': 'wav'}, 'output': {'provider': 'twilio', 'format': 'wav'}, 'api_tools': None}
2024-05-19 14:43:34 2024-05-19 09:13:34.667 INFO {task_manager} [__init__] task initialization completed
2024-05-19 14:43:34 2024-05-19 09:13:34.667 INFO {task_manager} [__setup_transcriber] Setting up transcriber
2024-05-19 14:43:34 2024-05-19 09:13:34.667 INFO {deepgram_transcriber} [__init__] Initializing transcriber {}
2024-05-19 14:43:34 2024-05-19 09:13:34.667 INFO {deepgram_transcriber} [__init__] self.stream: True
2024-05-19 14:43:34 2024-05-19 09:13:34.667 INFO {task_manager} [__setup_synthesizer] Synthesizer config: {'provider': 'elevenlabs', 'provider_config': {'voice': 'Rachel', 'voice_id': '21m00Tcm4TlvDq8ikWAM', 'model': 'eleven_multilingual_v2', 'temperature': 0.5, 'similarity_boost': 0.5, 'sampling_rate': 8000}, 'stream': True, 'buffer_size': 100, 'audio_format': 'pcm', 'caching': True}
2024-05-19 14:43:34 2024-05-19 09:13:34.667 INFO {elevenlabs_synthesizer} [__init__] Using turbo or not eleven_turbo_v2
2024-05-19 14:43:34 2024-05-19 09:13:34.668 INFO {task_manager} [__setup_llm] ### PROVIDER openai
2024-05-19 14:43:34 2024-05-19 09:13:34.668 INFO {task_manager} [__setup_llm] LLM CONFIG {'model': 'gpt-3.5-turbo', 'max_tokens': 100, 'buffer_size': 100}
2024-05-19 14:43:34 2024-05-19 09:13:34.668 INFO {openai_llm} [__init__] Initializing OpenAI LLM with model: gpt-3.5-turbo and maxc tokens 100
2024-05-19 14:43:34 2024-05-19 09:13:34.697 INFO {openai_llm} [__init__] Initializing OpenAI LLM with model: gpt-3.5-turbo and maxc tokens 100
2024-05-19 14:43:34 2024-05-19 09:13:34.728 INFO {task_manager} [__setup_tasks] prompt and config setup completed
2024-05-19 14:43:34 2024-05-19 09:13:34.728 INFO {task_manager} [__init__] Conversation config {'optimize_latency': True, 'hangup_after_silence': 10, 'incremental_delay': 100, 'number_of_words_for_interruption': 1, 'interruption_backoff_period': 100, 'hangup_after_LLMCall': False, 'call_cancellation_prompt': None, 'backchanneling': False, 'backchanneling_message_gap': 5, 'backchanneling_start_delay': 5}
2024-05-19 14:43:34 2024-05-19 09:13:34.728 INFO {task_manager} [__init__] Processing interim results true
2024-05-19 14:43:34 2024-05-19 09:13:34.728 INFO {task_manager} [__init__] minimum wait duration 400
2024-05-19 14:43:34 2024-05-19 09:13:34.728 INFO {task_manager} [__init__] incremental_delay - 100
2024-05-19 14:43:34 2024-05-19 09:13:34.728 INFO {task_manager} [__init__] hangup_after_silence 10
2024-05-19 14:43:34 2024-05-19 09:13:34.728 INFO {task_manager} [__init__] Not setting up backchanneling
2024-05-19 14:43:34 2024-05-19 09:13:34.728 INFO {task_manager} [load_prompt] prompt and config setup started
2024-05-19 14:43:34 2024-05-19 09:13:34.728 INFO {utils} [get_prompt_responses] Loading up the conversation details from the local file
2024-05-19 14:43:34 2024-05-19 09:13:34.731 INFO {task_manager} [run] starting task_id 0
2024-05-19 14:43:34 2024-05-19 09:13:34.733 INFO {task_manager} [run] Starting synthesizer task
2024-05-19 14:43:34 2024-05-19 09:13:34.733 INFO {task_manager} [run] Starting the first message task
2024-05-19 14:43:34 2024-05-19 09:13:34.735 INFO {task_manager} [__handle_initial_silence] Checking for initial silence 15
2024-05-19 14:43:34 2024-05-19 09:13:34.735 INFO {task_manager} [_listen_transcriber] Starting transcriber task
2024-05-19 14:43:34 2024-05-19 09:13:34.737 INFO {task_manager} [__listen_synthesizer] Opening websocket connection to synthesizer
2024-05-19 14:43:34 2024-05-19 09:13:34.739 INFO {task_manager} [__process_output_loop] First interim result hasn't been gotten yet and hence sleeping 
2024-05-19 14:43:34 2024-05-19 09:13:34.739 INFO {task_manager} [__first_message] Executing the first message task
2024-05-19 14:43:34 2024-05-19 09:13:34.740 INFO {task_manager} [__first_message] Stream id is still None, so not passing it
2024-05-19 14:43:34 2024-05-19 09:13:34.742 INFO {deepgram_transcriber} [transcribe] STARTED TRANSCRIBING
2024-05-19 14:43:34 2024-05-19 09:13:34.742 INFO {deepgram_transcriber} [get_deepgram_ws_url] Deepgram websocket url: wss://api.deepgram.com/v1/listen?model=nova-2&filler_words=true&diarize=true&language=en&vad_events=true&encoding=mulaw&sample_rate=8000&channels=1&interim_results=true&utterance_end_ms=1000
2024-05-19 14:43:34 2024-05-19 09:13:34.844 INFO {task_manager} [__process_output_loop] ##### Got to wait 300 ms before speaking and alreasy waited -1 since the first interim result
2024-05-19 14:43:35 2024-05-19 09:13:35.183 INFO {elevenlabs_synthesizer} [open_connection] Connected to the server
2024-05-19 14:43:35 2024-05-19 09:13:35.183 INFO {task_manager} [__listen_synthesizer] Listening to synthesizer
2024-05-19 14:43:35 2024-05-19 09:13:35.241 INFO {task_manager} [__first_message] Got stream sid and hence sending the first message MZ8d496c7d2030e339f330ef4ef35d0bad
2024-05-19 14:43:35 2024-05-19 09:13:35.242 INFO {task_manager} [__first_message] Generating None
2024-05-19 14:43:35 2024-05-19 09:13:35.242 INFO {task_manager} [_synthesize] ##### sending text to elevenlabs for generation: None 
2024-05-19 14:43:35 2024-05-19 09:13:35.242 INFO {task_manager} [_synthesize] Cached response and hence sending preprocessed text
2024-05-19 14:43:35 Traceback (most recent call last):
2024-05-19 14:43:35   File "/usr/local/lib/python3.10/site-packages/bolna/agent_manager/task_manager.py", line 1192, in _synthesize
2024-05-19 14:43:35     await self.__send_preprocessed_audio(meta_info, get_md5_hash(text))
2024-05-19 14:43:35   File "/usr/local/lib/python3.10/site-packages/bolna/helpers/utils.py", line 211, in get_md5_hash
2024-05-19 14:43:35     return hashlib.md5(text.encode()).hexdigest()
2024-05-19 14:43:35 AttributeError: 'NoneType' object has no attribute 'encode'
2024-05-19 14:43:35 2024-05-19 09:13:35.243 ERROR {task_manager} [_synthesize] Error in synthesizer: 'NoneType' object has no attribute 'encode'
2024-05-19 14:43:35 2024-05-19 09:13:35.243 INFO {utils} [write_request_logs] Message {'direction': 'request', 'data': None, 'leg_id': 'c7cbddf9-40bd-4267-93da-194410af2c4a', 'time': '2024-05-19 09:13:35', 'component': 'synthesizer', 'sequence_id': -1, 'model': 'elevenlabs', 'cached': False, 'is_final': False, 'engine': 'eleven_turbo_v2'}
2024-05-19 14:43:35 2024-05-19 09:13:35.244 INFO {utils} [write_request_logs] Message {'direction': 'response', 'data': None, 'leg_id': 'c7cbddf9-40bd-4267-93da-194410af2c4a', 'time': '2024-05-19 09:13:35', 'component': 'synthesizer', 'sequence_id': -1, 'model': 'elevenlabs', 'cached': True, 'is_final': False, 'engine': 'eleven_turbo_v2'}
2024-05-19 14:43:36 2024-05-19 09:13:36.741 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
2024-05-19 14:43:38 2024-05-19 09:13:38.741 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
2024-05-19 14:43:40 2024-05-19 09:13:40.742 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
2024-05-19 14:43:42 2024-05-19 09:13:42.742 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
2024-05-19 14:43:44 2024-05-19 09:13:44.743 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
2024-05-19 14:43:44 2024-05-19 09:13:44.744 ERROR {deepgram_transcriber} [transcribe] Error in transcribe: 
2024-05-19 14:43:46 2024-05-19 09:13:46.744 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
2024-05-19 14:43:48 2024-05-19 09:13:48.743 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
2024-05-19 14:43:49 2024-05-19 09:13:49.734 INFO {task_manager} [__handle_initial_silence] Woke up from my slumber True, [{'role': 'system', 'content': 'You will always be very concise and brief with your responses. You will NEVER speak longer than 2 sentences. \nYou are Rachel, an order representative for Pizza Square. You will be very helpful and kind. Customers will call you and give their order. You will follow this script. You will ONLY ask 1 question at a time. Once you get an answer you will move on to the next one. \n 1) You will start your conversation by saying. "Hi! I am Rachel from Pizza Square. Am I speaking to Michael from 345 Palm Square?" All your responses should be as long as this one.) \n 2) You will ask the customer what they would like to order. Only ask the next question once you have a clear answer to this question. If the customer wants assistance, you will ask them about their preferences. \n 3) You will ask the customer for the required size - regular, medium, or large. If the customer has follow up questions, you will clarify, that the small is for 1 person and 4 dollars, the medium is for 2 people and 7 dollars and the large is for 4 people and 11 dollars. Only ask the next question once you have a clear answer to this question. \n 4) Once confirmed, you will ask whether the customer wants to pay the amount of for the chosen pizza via card or cash. \n 5) You will then tell the customer that their pizza is in the oven and will reach within 30 minutes. \n You have options of 3 pizzas. You will only give description if the user asks for it.  \n 1) The BBQ Chicken - Which has Chicken, Mushrooms, Olives in a tomato base 2) The Pesto Dominator - Which has Chicken, Bacon and Ham in a green base. This is the best-selling pizza. 3) The Veg Ultimate - Which has tofu, mushrooms, olives, tomatoes in a tomato base. '}], [{'role': 'system', 'content': 'You will always be very concise and brief with your responses. You will NEVER speak longer than 2 sentences. \nYou are Rachel, an order representative for Pizza Square. You will be very helpful and kind. Customers will call you and give their order. You will follow this script. You will ONLY ask 1 question at a time. Once you get an answer you will move on to the next one. \n 1) You will start your conversation by saying. "Hi! I am Rachel from Pizza Square. Am I speaking to Michael from 345 Palm Square?" All your responses should be as long as this one.) \n 2) You will ask the customer what they would like to order. Only ask the next question once you have a clear answer to this question. If the customer wants assistance, you will ask them about their preferences. \n 3) You will ask the customer for the required size - regular, medium, or large. If the customer has follow up questions, you will clarify, that the small is for 1 person and 4 dollars, the medium is for 2 people and 7 dollars and the large is for 4 people and 11 dollars. Only ask the next question once you have a clear answer to this question. \n 4) Once confirmed, you will ask whether the customer wants to pay the amount of for the chosen pizza via card or cash. \n 5) You will then tell the customer that their pizza is in the oven and will reach within 30 minutes. \n You have options of 3 pizzas. You will only give description if the user asks for it.  \n 1) The BBQ Chicken - Which has Chicken, Mushrooms, Olives in a tomato base 2) The Pesto Dominator - Which has Chicken, Bacon and Ham in a green base. This is the best-selling pizza. 3) The Veg Ultimate - Which has tofu, mushrooms, olives, tomatoes in a tomato base. '}]
2024-05-19 14:43:49 2024-05-19 09:13:49.734 INFO {task_manager} [__handle_initial_silence] Calee was silent and hence speaking Hello on callee's behalf
2024-05-19 14:43:50 2024-05-19 09:13:50.743 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
2024-05-19 14:43:50 2024-05-19 09:13:50.948 INFO {telephony} [_listen] call stopping
2024-05-19 14:43:52 2024-05-19 09:13:52.743 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
2024-05-19 14:43:54 2024-05-19 09:13:54.744 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
2024-05-19 14:43:55 INFO:     Shutting down
2024-05-19 14:43:55 INFO:     Waiting for background tasks to complete. (CTRL+C to force quit)
2024-05-19 14:43:56 2024-05-19 09:13:56.745 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
2024-05-19 14:43:58 2024-05-19 09:13:58.746 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
2024-05-19 14:44:00 2024-05-19 09:14:00.747 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
2024-05-19 14:44:02 2024-05-19 09:14:02.747 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
2024-05-19 14:44:04 2024-05-19 09:14:04.747 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
prateeksachan commented 4 months ago

@saksham003 Thanks for the logs. Let me check this - this one seems to be a different issue related to transcriber.

prateeksachan commented 4 months ago

@saksham003 I'm not able to reproduce the transcriber issue as per your logs. Can you make sure you've added deepgram's token DEEPGRAM_AUTH_TOKEN in your environment file?

The other error related to synthesizer Error in synthesizer: 'NoneType' object has no attribute 'encode' you can ignore for now, working on a fix for that in parallel.

If you're still unable to get the trascriber working, would like to get on a call and debug it. Can you please mark any slot on https://calendly.com/bolna/30min?

saksham003 commented 4 months ago

@prateeksachan Yes I've added DEEPGRAM_AUTH_TOKEN in .env file. Everything was functioning correctly for me a few days ago, but I have now started encountering this error unexpectedly.

saksham003 commented 3 months ago

@prateeksachan any update on this issue

prateeksachan commented 3 months ago

@saksham003 I've removed a few things and introduced agent_welcome_message that should fix the synthesizer exceptions in 5daac5b660b66c3d2367537be8468fde9651c645. Can you check once.

Also, can you also re-verify that you're able to connect to Deepgram and there's no considerable latency.

I've tried out the agent_config which you'd posted.

{
  "agent_config": {
    "agent_name": "test",
    "agent_type": "bolna_test",
    "agent_welcome_message": "Hi, my name is Rachel.",
    "tasks": [
      {
        "task_type": "conversation",
        "call_cancellation_prompt": null,
        "hangup_after_LLMCall": false,
        "hangup_after_silence": 10,
        "incremental_delay": 100,
        "interruption_backoff_period": 100,
        "number_of_words_for_interruption": 3,
        "optimize_latency": true,
        "task_config": {
          "hangup_after_silence": 10
        },
        "tools_config": {
          "llm_agent": {
            "max_tokens": 100,
            "family": "openai",
            "model": "gpt-3.5-turbo",
            "agent_flow_type": "streaming",
            "use_fallback": true,
            "temperature": 0.2,
            "base_url": null
          },
          "synthesizer": {
            "audio_format": "wav",
            "buffer_size": 100,
            "provider": "elevenlabs",
            "provider_config": {
                "voice": "Rachel",
                "model": "eleven_multilingual_v2",
                "voice_id": "21m00Tcm4TlvDq8ikWAM"
            },
            "stream": true
          },
          "transcriber": {
            "model": "deepgram",
            "stream": true,
            "language": "en",
            "endpointing": 400,
            "keywords": ""
          },
          "input": {
            "provider": "twilio",
            "format": "wav"
          },
          "output": {
            "provider": "twilio",
            "format": "wav"
          }
        },
        "toolchain": {
          "execution": "parallel",
          "pipelines": [["transcriber", "llm", "synthesizer"]]
        }
      }
    ]
  },
  "agent_prompts": {
    "task_1": {
      "system_prompt": "You will always be very concise and brief with your responses. You will NEVER speak longer than 2 sentences. \nYou are Rachel, an order representative for Pizza Square. You will be very helpful and kind. Customers will call you and give their order. You will follow this script. You will ONLY ask 1 question at a time. Once you get an answer you will move on to the next one. \n 1) You will start your conversation by saying. \"Hi! I am Rachel from Pizza Square. Am I speaking to Michael from 345 Palm Square?\" All your responses should be as long as this one.) \n 2) You will ask the customer what they would like to order. Only ask the next question once you have a clear answer to this question. If the customer wants assistance, you will ask them about their preferences. \n 3) You will ask the customer for the required size - regular, medium, or large. If the customer has follow up questions, you will clarify, that the small is for 1 person and 4 dollars, the medium is for 2 people and 7 dollars and the large is for 4 people and 11 dollars. Only ask the next question once you have a clear answer to this question. \n 4) Once confirmed, you will ask whether the customer wants to pay the amount of for the chosen pizza via card or cash. \n 5) You will then tell the customer that their pizza is in the oven and will reach within 30 minutes. \n You have options of 3 pizzas. You will only give description if the user asks for it.  \n 1) The BBQ Chicken - Which has Chicken, Mushrooms, Olives in a tomato base 2) The Pesto Dominator - Which has Chicken, Bacon and Ham in a green base. This is the best-selling pizza. 3) The Veg Ultimate - Which has tofu, mushrooms, olives, tomatoes in a tomato base. "
    }
  }
}
saksham003 commented 3 months ago

@prateeksachan I'm getting the same issue with DeepGram, this error is getting logged but there are no details - ERROR {deepgram_transcriber} [transcribe] Error in transcribe:

Here are all the logs -

2024-05-20 19:22:59 2024-05-20 13:52:59.933 INFO {quickstart_server} [websocket_endpoint] Connected to ws
2024-05-20 19:22:59 INFO:     ('172.22.0.3', 60860) - "WebSocket /chat/v1/677d079a-37f3-4164-8143-1786150a2c7e" [accepted]
2024-05-20 19:22:59 INFO:     connection open
2024-05-20 19:22:59 2024-05-20 13:52:59.936 INFO {quickstart_server} [websocket_endpoint] Retrieved agent config: {"agent_name": "bolna_test", "agent_type": "test", "tasks": [{"tools_config": {"llm_agent": {"model": "gpt-3.5-turbo", "max_tokens": 100, "agent_flow_type": "streaming", "family": "openai", "temperature": 0.2, "request_json": false, "stop": null, "top_k": 0, "top_p": 0.9, "min_p": 0.1, "frequency_penalty": 0.0, "presence_penalty": 0.0, "provider": "openai", "base_url": null, "routes": null, "extraction_details": null, "summarization_details": null}, "synthesizer": {"provider": "elevenlabs", "provider_config": {"voice": "Rachel", "voice_id": "21m00Tcm4TlvDq8ikWAM", "model": "eleven_multilingual_v2", "temperature": 0.5, "similarity_boost": 0.5}, "stream": true, "buffer_size": 100, "audio_format": "wav", "caching": true}, "transcriber": {"model": "deepgram", "language": "en", "stream": true, "sampling_rate": 16000, "encoding": "linear16", "endpointing": 400, "keywords": null}, "input": {"provider": "twilio", "format": "wav"}, "output": {"provider": "twilio", "format": "wav"}, "api_tools": null}, "toolchain": {"execution": "parallel", "pipelines": [["transcriber", "llm", "synthesizer"]]}, "task_type": "conversation", "task_config": {"optimize_latency": true, "hangup_after_silence": 10, "incremental_delay": 100, "number_of_words_for_interruption": 1, "interruption_backoff_period": 100, "hangup_after_LLMCall": false, "call_cancellation_prompt": null, "backchanneling": false, "backchanneling_message_gap": 5, "backchanneling_start_delay": 5}}], "assistant_status": "seeding"}
2024-05-20 19:22:59 2024-05-20 13:52:59.936 INFO {assistant_manager} [run] Running task 0 {'tools_config': {'llm_agent': {'model': 'gpt-3.5-turbo', 'max_tokens': 100, 'agent_flow_type': 'streaming', 'family': 'openai', 'temperature': 0.2, 'request_json': False, 'stop': None, 'top_k': 0, 'top_p': 0.9, 'min_p': 0.1, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'provider': 'openai', 'base_url': None, 'routes': None, 'extraction_details': None, 'summarization_details': None}, 'synthesizer': {'provider': 'elevenlabs', 'provider_config': {'voice': 'Rachel', 'voice_id': '21m00Tcm4TlvDq8ikWAM', 'model': 'eleven_multilingual_v2', 'temperature': 0.5, 'similarity_boost': 0.5}, 'stream': True, 'buffer_size': 100, 'audio_format': 'wav', 'caching': True}, 'transcriber': {'model': 'deepgram', 'language': 'en', 'stream': True, 'sampling_rate': 16000, 'encoding': 'linear16', 'endpointing': 400, 'keywords': None}, 'input': {'provider': 'twilio', 'format': 'wav'}, 'output': {'provider': 'twilio', 'format': 'wav'}, 'api_tools': None}, 'toolchain': {'execution': 'parallel', 'pipelines': [['transcriber', 'llm', 'synthesizer']]}, 'task_type': 'conversation', 'task_config': {'optimize_latency': True, 'hangup_after_silence': 10, 'incremental_delay': 100, 'number_of_words_for_interruption': 1, 'interruption_backoff_period': 100, 'hangup_after_LLMCall': False, 'call_cancellation_prompt': None, 'backchanneling': False, 'backchanneling_message_gap': 5, 'backchanneling_start_delay': 5}} and sending kwargs {'agent_welcome_message': 'This call is being recorded for quality assurance and training. Please speak now.'}
2024-05-20 19:22:59 2024-05-20 13:52:59.937 INFO {task_manager} [__init__] doing task {'tools_config': {'llm_agent': {'model': 'gpt-3.5-turbo', 'max_tokens': 100, 'agent_flow_type': 'streaming', 'family': 'openai', 'temperature': 0.2, 'request_json': False, 'stop': None, 'top_k': 0, 'top_p': 0.9, 'min_p': 0.1, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'provider': 'openai', 'base_url': None, 'routes': None, 'extraction_details': None, 'summarization_details': None}, 'synthesizer': {'provider': 'elevenlabs', 'provider_config': {'voice': 'Rachel', 'voice_id': '21m00Tcm4TlvDq8ikWAM', 'model': 'eleven_multilingual_v2', 'temperature': 0.5, 'similarity_boost': 0.5}, 'stream': True, 'buffer_size': 100, 'audio_format': 'wav', 'caching': True}, 'transcriber': {'model': 'deepgram', 'language': 'en', 'stream': True, 'sampling_rate': 16000, 'encoding': 'linear16', 'endpointing': 400, 'keywords': None}, 'input': {'provider': 'twilio', 'format': 'wav'}, 'output': {'provider': 'twilio', 'format': 'wav'}, 'api_tools': None}, 'toolchain': {'execution': 'parallel', 'pipelines': [['transcriber', 'llm', 'synthesizer']]}, 'task_type': 'conversation', 'task_config': {'optimize_latency': True, 'hangup_after_silence': 10, 'incremental_delay': 100, 'number_of_words_for_interruption': 1, 'interruption_backoff_period': 100, 'hangup_after_LLMCall': False, 'call_cancellation_prompt': None, 'backchanneling': False, 'backchanneling_message_gap': 5, 'backchanneling_start_delay': 5}}
2024-05-20 19:22:59 2024-05-20 13:52:59.937 INFO {task_manager} [__init__] Connected via websocket
2024-05-20 19:22:59 2024-05-20 13:52:59.937 INFO {task_manager} [__setup_input_handlers] Connected through dashboard None
2024-05-20 19:22:59 2024-05-20 13:52:59.937 INFO {task_manager} [__setup_output_handlers] Making sure that the sampling rate for output handler is 8000
2024-05-20 19:22:59 2024-05-20 13:52:59.938 INFO {task_manager} [__init__] History []
2024-05-20 19:22:59 2024-05-20 13:52:59.938 INFO {task_manager} [__init__] TASK CONFIG {'llm_agent': {'model': 'gpt-3.5-turbo', 'max_tokens': 100, 'agent_flow_type': 'streaming', 'family': 'openai', 'temperature': 0.2, 'request_json': False, 'stop': None, 'top_k': 0, 'top_p': 0.9, 'min_p': 0.1, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'provider': 'openai', 'base_url': None, 'routes': None, 'extraction_details': None, 'summarization_details': None}, 'synthesizer': {'provider': 'elevenlabs', 'provider_config': {'voice': 'Rachel', 'voice_id': '21m00Tcm4TlvDq8ikWAM', 'model': 'eleven_multilingual_v2', 'temperature': 0.5, 'similarity_boost': 0.5, 'sampling_rate': 8000}, 'stream': True, 'buffer_size': 100, 'audio_format': 'pcm', 'caching': True}, 'transcriber': {'model': 'deepgram', 'language': 'en', 'stream': True, 'sampling_rate': 16000, 'encoding': 'linear16', 'endpointing': 400, 'keywords': None}, 'input': {'provider': 'twilio', 'format': 'wav'}, 'output': {'provider': 'twilio', 'format': 'wav'}, 'api_tools': None}
2024-05-20 19:22:59 2024-05-20 13:52:59.939 INFO {task_manager} [__init__] task initialization completed
2024-05-20 19:22:59 2024-05-20 13:52:59.939 INFO {task_manager} [__setup_transcriber] Setting up transcriber
2024-05-20 19:22:59 2024-05-20 13:52:59.939 INFO {deepgram_transcriber} [__init__] Initializing transcriber {'agent_welcome_message': 'This call is being recorded for quality assurance and training. Please speak now.'}
2024-05-20 19:22:59 2024-05-20 13:52:59.940 INFO {deepgram_transcriber} [__init__] self.stream: True
2024-05-20 19:22:59 2024-05-20 13:52:59.940 INFO {task_manager} [__setup_synthesizer] Synthesizer config: {'provider': 'elevenlabs', 'provider_config': {'voice': 'Rachel', 'voice_id': '21m00Tcm4TlvDq8ikWAM', 'model': 'eleven_multilingual_v2', 'temperature': 0.5, 'similarity_boost': 0.5, 'sampling_rate': 8000}, 'stream': True, 'buffer_size': 100, 'audio_format': 'pcm', 'caching': True}
2024-05-20 19:22:59 2024-05-20 13:52:59.940 INFO {elevenlabs_synthesizer} [__init__] Using turbo or not eleven_turbo_v2
2024-05-20 19:22:59 2024-05-20 13:52:59.940 INFO {task_manager} [__setup_llm] ### PROVIDER openai
2024-05-20 19:22:59 2024-05-20 13:52:59.941 INFO {task_manager} [__setup_llm] LLM CONFIG {'model': 'gpt-3.5-turbo', 'max_tokens': 100, 'buffer_size': 100}
2024-05-20 19:22:59 2024-05-20 13:52:59.941 INFO {openai_llm} [__init__] Initializing OpenAI LLM with model: gpt-3.5-turbo and maxc tokens 100
2024-05-20 19:22:59 2024-05-20 13:52:59.978 INFO {openai_llm} [__init__] Initializing OpenAI LLM with model: gpt-3.5-turbo and maxc tokens 100
2024-05-20 19:23:00 2024-05-20 13:53:00.008 INFO {task_manager} [__setup_tasks] prompt and config setup completed
2024-05-20 19:23:00 2024-05-20 13:53:00.008 INFO {task_manager} [__init__] Conversation config {'optimize_latency': True, 'hangup_after_silence': 10, 'incremental_delay': 100, 'number_of_words_for_interruption': 1, 'interruption_backoff_period': 100, 'hangup_after_LLMCall': False, 'call_cancellation_prompt': None, 'backchanneling': False, 'backchanneling_message_gap': 5, 'backchanneling_start_delay': 5}
2024-05-20 19:23:00 2024-05-20 13:53:00.008 INFO {task_manager} [__init__] Processing interim results true
2024-05-20 19:23:00 2024-05-20 13:53:00.008 INFO {task_manager} [__init__] minimum wait duration 400
2024-05-20 19:23:00 2024-05-20 13:53:00.008 INFO {task_manager} [__init__] incremental_delay - 100
2024-05-20 19:23:00 2024-05-20 13:53:00.008 INFO {task_manager} [__init__] hangup_after_silence 10
2024-05-20 19:23:00 2024-05-20 13:53:00.008 INFO {task_manager} [__init__] Not setting up backchanneling
2024-05-20 19:23:00 2024-05-20 13:53:00.008 INFO {task_manager} [__init__] Agent welcome message present This call is being recorded for quality assurance and training. Please speak now.
2024-05-20 19:23:00 2024-05-20 13:53:00.008 INFO {task_manager} [load_prompt] prompt and config setup started
2024-05-20 19:23:00 2024-05-20 13:53:00.008 INFO {utils} [get_prompt_responses] Loading up the conversation details from the local file
2024-05-20 19:23:00 2024-05-20 13:53:00.013 INFO {task_manager} [run] starting task_id 0
2024-05-20 19:23:00 2024-05-20 13:53:00.040 INFO {task_manager} [run] Starting synthesizer task
2024-05-20 19:23:00 2024-05-20 13:53:00.047 INFO {task_manager} [run] Starting the first message task
2024-05-20 19:23:00 2024-05-20 13:53:00.067 WARNING {runners} [run] Executing <Task pending name='Task-32' coro=<WebSocketProtocol.run_asgi() running at /usr/local/lib/python3.10/site-packages/uvicorn/protocols/websockets/websockets_impl.py:254> wait_for=<_GatheringFuture pending cb=[Task.task_wakeup()] created at /usr/local/lib/python3.10/asyncio/tasks.py:665> cb=[WebSocketProtocol.on_task_complete()] created at /usr/local/lib/python3.10/site-packages/uvicorn/protocols/websockets/websockets_impl.py:207> took 0.131 seconds
2024-05-20 19:23:00 2024-05-20 13:53:00.070 INFO {task_manager} [__handle_initial_silence] Checking for initial silence 15
2024-05-20 19:23:00 2024-05-20 13:53:00.076 INFO {task_manager} [_listen_transcriber] Starting transcriber task
2024-05-20 19:23:00 2024-05-20 13:53:00.081 INFO {task_manager} [__listen_synthesizer] Opening websocket connection to synthesizer
2024-05-20 19:23:00 2024-05-20 13:53:00.098 INFO {task_manager} [__process_output_loop] First interim result hasn't been gotten yet and hence sleeping 
2024-05-20 19:23:00 2024-05-20 13:53:00.103 INFO {task_manager} [__first_message] Executing the first message task
2024-05-20 19:23:00 2024-05-20 13:53:00.103 INFO {task_manager} [__first_message] Stream id is still None, so not passing it
2024-05-20 19:23:00 2024-05-20 13:53:00.115 INFO {deepgram_transcriber} [transcribe] STARTED TRANSCRIBING
2024-05-20 19:23:00 2024-05-20 13:53:00.116 INFO {deepgram_transcriber} [get_deepgram_ws_url] Deepgram websocket url: wss://api.deepgram.com/v1/listen?model=nova-2&filler_words=true&diarize=true&language=en&vad_events=true&encoding=mulaw&sample_rate=8000&channels=1&interim_results=true&utterance_end_ms=1000
2024-05-20 19:23:00 2024-05-20 13:53:00.203 INFO {task_manager} [__process_output_loop] ##### Got to wait 300 ms before speaking and alreasy waited -1 since the first interim result
2024-05-20 19:23:00 2024-05-20 13:53:00.515 INFO {elevenlabs_synthesizer} [open_connection] Connected to the server
2024-05-20 19:23:00 2024-05-20 13:53:00.515 INFO {task_manager} [__listen_synthesizer] Listening to synthesizer
2024-05-20 19:23:00 2024-05-20 13:53:00.612 INFO {task_manager} [__first_message] Got stream sid and hence sending the first message MZ65f3c951ffa392263681382d83e5bed2
2024-05-20 19:23:00 2024-05-20 13:53:00.612 INFO {task_manager} [__first_message] Generating This call is being recorded for quality assurance and training. Please speak now.
2024-05-20 19:23:00 2024-05-20 13:53:00.615 INFO {task_manager} [_synthesize] ##### sending text to elevenlabs for generation: This call is being recorded for quality assurance and training. Please speak now. 
2024-05-20 19:23:00 2024-05-20 13:53:00.615 INFO {task_manager} [_synthesize] Cached response and hence sending preprocessed text
2024-05-20 19:23:00 2024-05-20 13:53:00.619 INFO {task_manager} [__send_preprocessed_audio] File doesn't exist in S3. Hence we're synthesizing it from synthesizer
2024-05-20 19:23:00 2024-05-20 13:53:00.622 INFO {task_manager} [_synthesize] ##### sending text to elevenlabs for generation: This call is being recorded for quality assurance and training. Please speak now. 
2024-05-20 19:23:00 2024-05-20 13:53:00.622 INFO {elevenlabs_synthesizer} [push] Pushed message to internal queue {'data': 'This call is being recorded for quality assurance and training. Please speak now.', 'meta_info': {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'pcm', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855}}
2024-05-20 19:23:00 2024-05-20 13:53:00.625 INFO {utils} [write_request_logs] Message {'direction': 'request', 'data': 'This call is being recorded for quality assurance and training. Please speak now.', 'leg_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'time': '2024-05-20 13:53:00', 'component': 'synthesizer', 'sequence_id': -1, 'model': 'elevenlabs', 'cached': False, 'is_final': False, 'engine': 'eleven_turbo_v2'}
2024-05-20 19:23:00 2024-05-20 13:53:00.628 INFO {utils} [write_request_logs] Message {'direction': 'response', 'data': 'This call is being recorded for quality assurance and training. Please speak now.', 'leg_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'time': '2024-05-20 13:53:00', 'component': 'synthesizer', 'sequence_id': -1, 'model': 'elevenlabs', 'cached': True, 'is_final': False, 'engine': 'eleven_turbo_v2'}
2024-05-20 19:23:00 2024-05-20 13:53:00.632 INFO {utils} [write_request_logs] Message {'direction': 'request', 'data': 'This call is being recorded for quality assurance and training. Please speak now.', 'leg_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'time': '2024-05-20 13:53:00', 'component': 'synthesizer', 'sequence_id': -1, 'model': 'elevenlabs', 'cached': False, 'is_final': False, 'engine': 'eleven_turbo_v2'}
2024-05-20 19:23:00 2024-05-20 13:53:00.636 INFO {elevenlabs_synthesizer} [generate] Generating TTS response for message: {'data': 'This call is being recorded for quality assurance and training. Please speak now.', 'meta_info': {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'pcm', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855}}
2024-05-20 19:23:00 2024-05-20 13:53:00.636 INFO {inmemory_scalar_cache} [get] Cache miss for key This call is being recorded for quality assurance and training. Please speak now.
2024-05-20 19:23:00 2024-05-20 13:53:00.636 INFO {elevenlabs_synthesizer} [generate] Not a cache hit [] and hence increasing characters by 81
2024-05-20 19:23:00 2024-05-20 13:53:00.636 INFO {elevenlabs_synthesizer} [__generate_http] text This call is being recorded for quality assurance and training. Please speak now.
2024-05-20 19:23:01 2024-05-20 13:53:01.901 INFO {utils} [convert_audio_to_wav] CONVERTING AUDIO TO WAV mp3
2024-05-20 19:23:02 2024-05-20 13:53:02.113 INFO {utils} [convert_audio_to_wav] GOT audio wav <pydub.audio_segment.AudioSegment object at 0x7efef43fb190>
2024-05-20 19:23:02 2024-05-20 13:53:02.113 INFO {utils} [convert_audio_to_wav] SENDING BACK WAV
2024-05-20 19:23:02 2024-05-20 13:53:02.113 INFO {elevenlabs_synthesizer} [generate] self.sampling_rate 8000
2024-05-20 19:23:02 /usr/local/lib/python3.10/site-packages/torchaudio/backend/sox_io_backend.py:235: UserWarning: File-like object support in sox_io backend is deprecated, and will be removed in v2.1. See https://github.com/pytorch/audio/issues/2950 for the detail.Please migrate to the new dispatcher, or use soundfile backend.
2024-05-20 19:23:02   warnings.warn(_deprecation_message)
2024-05-20 19:23:02 2024-05-20 13:53:02.118 INFO {utils} [resample] Resampling from 44100 to 8000
2024-05-20 19:23:02 /usr/local/lib/python3.10/site-packages/torchaudio/backend/sox_io_backend.py:416: UserWarning: File-like object support in sox_io backend is deprecated, and will be removed in v2.1. See https://github.com/pytorch/audio/issues/2950 for the detail.Please migrate to the new dispatcher, or use soundfile backend.
2024-05-20 19:23:02   warnings.warn(_deprecation_message)
2024-05-20 19:23:02 2024-05-20 13:53:02.123 INFO {task_manager} [__listen_synthesizer] -1 is in sequence ids  set() and hence removing the sequence ids 
2024-05-20 19:23:02 2024-05-20 13:53:02.124 INFO {task_manager} [__enqueue_chunk] Meta_info of chunk {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} 0 8.41845703125
2024-05-20 19:23:02 2024-05-20 13:53:02.124 INFO {task_manager} [__enqueue_chunk] ##### Sending first chunk
2024-05-20 19:23:02 2024-05-20 13:53:02.125 INFO {task_manager} [__enqueue_chunk] Meta_info of chunk {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} 1 8.41845703125
2024-05-20 19:23:02 2024-05-20 13:53:02.126 INFO {task_manager} [__enqueue_chunk] Meta_info of chunk {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} 2 8.41845703125
2024-05-20 19:23:02 2024-05-20 13:53:02.126 INFO {task_manager} [__enqueue_chunk] Meta_info of chunk {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} 3 8.41845703125
2024-05-20 19:23:02 2024-05-20 13:53:02.126 INFO {task_manager} [__enqueue_chunk] Meta_info of chunk {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} 4 8.41845703125
2024-05-20 19:23:02 2024-05-20 13:53:02.126 INFO {task_manager} [__enqueue_chunk] Meta_info of chunk {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} 5 8.41845703125
2024-05-20 19:23:02 2024-05-20 13:53:02.126 INFO {task_manager} [__enqueue_chunk] Meta_info of chunk {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} 6 8.41845703125
2024-05-20 19:23:02 2024-05-20 13:53:02.126 INFO {task_manager} [__enqueue_chunk] Meta_info of chunk {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} 7 8.41845703125
2024-05-20 19:23:02 2024-05-20 13:53:02.126 INFO {task_manager} [__enqueue_chunk] Meta_info of chunk {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} 8 8.41845703125
2024-05-20 19:23:02 2024-05-20 13:53:02.126 INFO {task_manager} [__listen_synthesizer] Sleeping for 100 ms
2024-05-20 19:23:02 2024-05-20 13:53:02.131 WARNING {runners} [run] Executing <Task pending name='Task-40' coro=<TaskManager.__listen_synthesizer() running at /app/bolna/agent_manager/task_manager.py:1122> wait_for=<Future pending cb=[Task.task_wakeup()] created at /usr/local/lib/python3.10/asyncio/tasks.py:600> created at /usr/local/lib/python3.10/asyncio/tasks.py:337> took 0.230 seconds
2024-05-20 19:23:02 2024-05-20 13:53:02.132 INFO {utils} [write_request_logs] Message {'direction': 'response', 'data': 'This call is being recorded for quality assurance and training. Please speak now.', 'leg_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'time': '2024-05-20 13:53:02', 'component': 'synthesizer', 'sequence_id': -1, 'model': 'elevenlabs', 'cached': False, 'is_final': False, 'engine': 'eleven_turbo_v2'}
2024-05-20 19:23:02 2024-05-20 13:53:02.137 INFO {task_manager} [__process_output_loop] ##### Start response is True and hence starting to speak {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873, 'is_first_chunk_of_entire_response': True} Current sequence ids
2024-05-20 19:23:02 2024-05-20 13:53:02.137 INFO {telephony} [handle] Sending Message None and MZ65f3c951ffa392263681382d83e5bed2 and  {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873, 'is_first_chunk_of_entire_response': True}
2024-05-20 19:23:02 2024-05-20 13:53:02.137 INFO {telephony} [handle] Sending message 8192 wav
2024-05-20 19:23:02 2024-05-20 13:53:02.137 INFO {telephony} [handle] Got media message MZ65f3c951ffa392263681382d83e5bed2
2024-05-20 19:23:02 2024-05-20 13:53:02.138 INFO {task_manager} [__process_output_loop] Duration of the byte 0.512
2024-05-20 19:23:02 2024-05-20 13:53:02.138 INFO {task_manager} [__process_output_loop] First chunk stuff
2024-05-20 19:23:02 2024-05-20 13:53:02.138 INFO {task_manager} [__process_output_loop] First chunk is none
2024-05-20 19:23:02 2024-05-20 13:53:02.138 INFO {task_manager} [__process_output_loop] LATENCY METRICS FOR c533f85f-1d8f-444d-ba2b-3b604a59b9e7 are {'transcriber': {'utterance_end': None, 'latency': 0}, 'llm': {'first_llm_buffer_latency': 0}, 'synthesizer': {'synthesizer_first_chunk_latency': 0}, 'overall_first_byte_latency': 0}
2024-05-20 19:23:02 2024-05-20 13:53:02.144 INFO {task_manager} [__check_for_completion] Last transmitted timestamp is simply 0 and hence continuing
2024-05-20 19:23:02 2024-05-20 13:53:02.756 INFO {task_manager} [__process_output_loop] ##### Updating Last transmitted timestamp to 1716213182.7559776
2024-05-20 19:23:02 2024-05-20 13:53:02.756 INFO {task_manager} [__process_output_loop] First interim result hasn't been gotten yet and hence sleeping 
2024-05-20 19:23:02 2024-05-20 13:53:02.861 INFO {task_manager} [__process_output_loop] ##### Got to wait 300 ms before speaking and alreasy waited -1 since the first interim result
2024-05-20 19:23:02 2024-05-20 13:53:02.861 INFO {task_manager} [__process_output_loop] ##### Start response is True and hence starting to speak {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} Current sequence ids
2024-05-20 19:23:02 2024-05-20 13:53:02.861 INFO {telephony} [handle] Sending Message None and MZ65f3c951ffa392263681382d83e5bed2 and  {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873}
2024-05-20 19:23:02 2024-05-20 13:53:02.861 INFO {telephony} [handle] Sending message 8192 wav
2024-05-20 19:23:02 2024-05-20 13:53:02.861 INFO {telephony} [handle] Got media message MZ65f3c951ffa392263681382d83e5bed2
2024-05-20 19:23:02 2024-05-20 13:53:02.861 INFO {task_manager} [__process_output_loop] Duration of the byte 0.512
2024-05-20 19:23:02 2024-05-20 13:53:02.862 INFO {task_manager} [__process_output_loop] ##### Sleeping for 0.512 to maintain quueue on our side 8000
2024-05-20 19:23:03 2024-05-20 13:53:03.352 INFO {task_manager} [__process_output_loop] ##### Updating Last transmitted timestamp to 1716213183.3524723
2024-05-20 19:23:03 2024-05-20 13:53:03.352 INFO {task_manager} [__process_output_loop] First interim result hasn't been gotten yet and hence sleeping 
2024-05-20 19:23:03 2024-05-20 13:53:03.460 INFO {task_manager} [__process_output_loop] ##### Got to wait 300 ms before speaking and alreasy waited -1 since the first interim result
2024-05-20 19:23:03 2024-05-20 13:53:03.460 INFO {task_manager} [__process_output_loop] ##### Start response is True and hence starting to speak {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} Current sequence ids
2024-05-20 19:23:03 2024-05-20 13:53:03.460 INFO {telephony} [handle] Sending Message None and MZ65f3c951ffa392263681382d83e5bed2 and  {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873}
2024-05-20 19:23:03 2024-05-20 13:53:03.460 INFO {telephony} [handle] Sending message 8192 wav
2024-05-20 19:23:03 2024-05-20 13:53:03.460 INFO {telephony} [handle] Got media message MZ65f3c951ffa392263681382d83e5bed2
2024-05-20 19:23:03 2024-05-20 13:53:03.461 INFO {task_manager} [__process_output_loop] Duration of the byte 0.512
2024-05-20 19:23:03 2024-05-20 13:53:03.461 INFO {task_manager} [__process_output_loop] ##### Sleeping for 0.512 to maintain quueue on our side 8000
2024-05-20 19:23:03 2024-05-20 13:53:03.950 INFO {task_manager} [__process_output_loop] ##### Updating Last transmitted timestamp to 1716213183.950947
2024-05-20 19:23:03 2024-05-20 13:53:03.951 INFO {task_manager} [__process_output_loop] First interim result hasn't been gotten yet and hence sleeping 
2024-05-20 19:23:04 2024-05-20 13:53:04.059 INFO {task_manager} [__process_output_loop] ##### Got to wait 300 ms before speaking and alreasy waited -1 since the first interim result
2024-05-20 19:23:04 2024-05-20 13:53:04.060 INFO {task_manager} [__process_output_loop] ##### Start response is True and hence starting to speak {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} Current sequence ids
2024-05-20 19:23:04 2024-05-20 13:53:04.060 INFO {telephony} [handle] Sending Message None and MZ65f3c951ffa392263681382d83e5bed2 and  {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873}
2024-05-20 19:23:04 2024-05-20 13:53:04.060 INFO {telephony} [handle] Sending message 8192 wav
2024-05-20 19:23:04 2024-05-20 13:53:04.060 INFO {telephony} [handle] Got media message MZ65f3c951ffa392263681382d83e5bed2
2024-05-20 19:23:04 2024-05-20 13:53:04.060 INFO {task_manager} [__process_output_loop] Duration of the byte 0.512
2024-05-20 19:23:04 2024-05-20 13:53:04.061 INFO {task_manager} [__process_output_loop] ##### Sleeping for 0.512 to maintain quueue on our side 8000
2024-05-20 19:23:04 2024-05-20 13:53:04.154 INFO {task_manager} [__check_for_completion] Only 0.20322799682617188 seconds since last spoken time stamp and hence not cutting the phone call
2024-05-20 19:23:04 2024-05-20 13:53:04.550 INFO {task_manager} [__process_output_loop] ##### Updating Last transmitted timestamp to 1716213184.550501
2024-05-20 19:23:04 2024-05-20 13:53:04.550 INFO {task_manager} [__process_output_loop] First interim result hasn't been gotten yet and hence sleeping 
2024-05-20 19:23:04 2024-05-20 13:53:04.660 INFO {task_manager} [__process_output_loop] ##### Got to wait 300 ms before speaking and alreasy waited -1 since the first interim result
2024-05-20 19:23:04 2024-05-20 13:53:04.660 INFO {task_manager} [__process_output_loop] ##### Start response is True and hence starting to speak {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} Current sequence ids
2024-05-20 19:23:04 2024-05-20 13:53:04.660 INFO {telephony} [handle] Sending Message None and MZ65f3c951ffa392263681382d83e5bed2 and  {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873}
2024-05-20 19:23:04 2024-05-20 13:53:04.660 INFO {telephony} [handle] Sending message 8192 wav
2024-05-20 19:23:04 2024-05-20 13:53:04.660 INFO {telephony} [handle] Got media message MZ65f3c951ffa392263681382d83e5bed2
2024-05-20 19:23:04 2024-05-20 13:53:04.660 INFO {task_manager} [__process_output_loop] Duration of the byte 0.512
2024-05-20 19:23:04 2024-05-20 13:53:04.661 INFO {task_manager} [__process_output_loop] ##### Sleeping for 0.512 to maintain quueue on our side 8000
2024-05-20 19:23:05 2024-05-20 13:53:05.152 INFO {task_manager} [__process_output_loop] ##### Updating Last transmitted timestamp to 1716213185.1525748
2024-05-20 19:23:05 2024-05-20 13:53:05.152 INFO {task_manager} [__process_output_loop] First interim result hasn't been gotten yet and hence sleeping 
2024-05-20 19:23:05 2024-05-20 13:53:05.259 INFO {task_manager} [__process_output_loop] ##### Got to wait 300 ms before speaking and alreasy waited -1 since the first interim result
2024-05-20 19:23:05 2024-05-20 13:53:05.259 INFO {task_manager} [__process_output_loop] ##### Start response is True and hence starting to speak {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} Current sequence ids
2024-05-20 19:23:05 2024-05-20 13:53:05.259 INFO {telephony} [handle] Sending Message None and MZ65f3c951ffa392263681382d83e5bed2 and  {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873}
2024-05-20 19:23:05 2024-05-20 13:53:05.259 INFO {telephony} [handle] Sending message 8192 wav
2024-05-20 19:23:05 2024-05-20 13:53:05.259 INFO {telephony} [handle] Got media message MZ65f3c951ffa392263681382d83e5bed2
2024-05-20 19:23:05 2024-05-20 13:53:05.259 INFO {task_manager} [__process_output_loop] Duration of the byte 0.512
2024-05-20 19:23:05 2024-05-20 13:53:05.259 INFO {task_manager} [__process_output_loop] ##### Sleeping for 0.512 to maintain quueue on our side 8000
2024-05-20 19:23:05 2024-05-20 13:53:05.746 INFO {task_manager} [__process_output_loop] ##### Updating Last transmitted timestamp to 1716213185.7468314
2024-05-20 19:23:05 2024-05-20 13:53:05.746 INFO {task_manager} [__process_output_loop] First interim result hasn't been gotten yet and hence sleeping 
2024-05-20 19:23:05 2024-05-20 13:53:05.851 INFO {task_manager} [__process_output_loop] ##### Got to wait 300 ms before speaking and alreasy waited -1 since the first interim result
2024-05-20 19:23:05 2024-05-20 13:53:05.851 INFO {task_manager} [__process_output_loop] ##### Start response is True and hence starting to speak {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} Current sequence ids
2024-05-20 19:23:05 2024-05-20 13:53:05.851 INFO {telephony} [handle] Sending Message None and MZ65f3c951ffa392263681382d83e5bed2 and  {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873}
2024-05-20 19:23:05 2024-05-20 13:53:05.851 INFO {telephony} [handle] Sending message 8192 wav
2024-05-20 19:23:05 2024-05-20 13:53:05.851 INFO {telephony} [handle] Got media message MZ65f3c951ffa392263681382d83e5bed2
2024-05-20 19:23:05 2024-05-20 13:53:05.852 INFO {task_manager} [__process_output_loop] Duration of the byte 0.512
2024-05-20 19:23:05 2024-05-20 13:53:05.852 INFO {task_manager} [__process_output_loop] ##### Sleeping for 0.512 to maintain quueue on our side 8000
2024-05-20 19:23:06 2024-05-20 13:53:06.159 INFO {task_manager} [__check_for_completion] Only 0.41292810440063477 seconds since last spoken time stamp and hence not cutting the phone call
2024-05-20 19:23:06 2024-05-20 13:53:06.340 INFO {task_manager} [__process_output_loop] ##### Updating Last transmitted timestamp to 1716213186.340925
2024-05-20 19:23:06 2024-05-20 13:53:06.341 INFO {task_manager} [__process_output_loop] First interim result hasn't been gotten yet and hence sleeping 
2024-05-20 19:23:06 2024-05-20 13:53:06.446 INFO {task_manager} [__process_output_loop] ##### Got to wait 300 ms before speaking and alreasy waited -1 since the first interim result
2024-05-20 19:23:06 2024-05-20 13:53:06.446 INFO {task_manager} [__process_output_loop] ##### Start response is True and hence starting to speak {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} Current sequence ids
2024-05-20 19:23:06 2024-05-20 13:53:06.446 INFO {telephony} [handle] Sending Message None and MZ65f3c951ffa392263681382d83e5bed2 and  {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873}
2024-05-20 19:23:06 2024-05-20 13:53:06.446 INFO {telephony} [handle] Sending message 8192 wav
2024-05-20 19:23:06 2024-05-20 13:53:06.446 INFO {telephony} [handle] Got media message MZ65f3c951ffa392263681382d83e5bed2
2024-05-20 19:23:06 2024-05-20 13:53:06.447 INFO {task_manager} [__process_output_loop] Duration of the byte 0.512
2024-05-20 19:23:06 2024-05-20 13:53:06.447 INFO {task_manager} [__process_output_loop] ##### Sleeping for 0.512 to maintain quueue on our side 8000
2024-05-20 19:23:06 2024-05-20 13:53:06.936 INFO {task_manager} [__process_output_loop] ##### Updating Last transmitted timestamp to 1716213186.936595
2024-05-20 19:23:06 2024-05-20 13:53:06.936 INFO {task_manager} [__process_output_loop] First interim result hasn't been gotten yet and hence sleeping 
2024-05-20 19:23:07 2024-05-20 13:53:07.041 INFO {task_manager} [__process_output_loop] ##### Got to wait 300 ms before speaking and alreasy waited -1 since the first interim result
2024-05-20 19:23:07 2024-05-20 13:53:07.042 INFO {task_manager} [__process_output_loop] ##### Start response is True and hence starting to speak {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873} Current sequence ids
2024-05-20 19:23:07 2024-05-20 13:53:07.042 INFO {telephony} [handle] Sending Message None and MZ65f3c951ffa392263681382d83e5bed2 and  {'io': 'twilio', 'is_first_message': True, 'stream_sid': 'MZ65f3c951ffa392263681382d83e5bed2', 'request_id': 'c533f85f-1d8f-444d-ba2b-3b604a59b9e7', 'cached': False, 'sequence_id': -1, 'format': 'wav', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1716213180.6193855, 'is_cached': False, 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_first_chunk': True, 'synthesizer_first_chunk_latency': 1.5047004222869873}
2024-05-20 19:23:07 2024-05-20 13:53:07.042 INFO {telephony} [handle] Sending message 3428 wav
2024-05-20 19:23:07 2024-05-20 13:53:07.042 INFO {telephony} [handle] Got media message MZ65f3c951ffa392263681382d83e5bed2
2024-05-20 19:23:07 2024-05-20 13:53:07.042 INFO {task_manager} [__process_output_loop] Duration of the byte 0.21425
2024-05-20 19:23:07 2024-05-20 13:53:07.042 INFO {task_manager} [__process_output_loop] ##### Sleeping for 0.21425 to maintain quueue on our side 8000
2024-05-20 19:23:07 2024-05-20 13:53:07.235 INFO {task_manager} [__process_output_loop] ##### Updating Last transmitted timestamp to 1716213187.235612
2024-05-20 19:23:07 2024-05-20 13:53:07.235 INFO {task_manager} [__process_output_loop] First interim result hasn't been gotten yet and hence sleeping 
2024-05-20 19:23:07 2024-05-20 13:53:07.341 INFO {task_manager} [__process_output_loop] ##### Got to wait 300 ms before speaking and alreasy waited -1 since the first interim result
2024-05-20 19:23:08 2024-05-20 13:53:08.169 INFO {task_manager} [__check_for_completion] Only 0.9337761402130127 seconds since last spoken time stamp and hence not cutting the phone call
2024-05-20 19:23:10 2024-05-20 13:53:10.127 ERROR {deepgram_transcriber} [transcribe] Error in transcribe: 
2024-05-20 19:23:10 2024-05-20 13:53:10.174 INFO {task_manager} [__check_for_completion] Only 2.938368797302246 seconds since last spoken time stamp and hence not cutting the phone call
2024-05-20 19:23:12 2024-05-20 13:53:12.182 INFO {task_manager} [__check_for_completion] Only 4.946526765823364 seconds since last spoken time stamp and hence not cutting the phone call
2024-05-20 19:23:14 2024-05-20 13:53:14.187 INFO {task_manager} [__check_for_completion] Only 6.951576232910156 seconds since last spoken time stamp and hence not cutting the phone call
2024-05-20 19:23:15 2024-05-20 13:53:15.078 INFO {task_manager} [__handle_initial_silence] Woke up from my slumber True, [{'role': 'system', 'content': 'You will always be very concise and brief with your responses. You will NEVER speak longer than 2 sentences. \nYou are Rachel, an order representative for Pizza Square. You will be very helpful and kind. Customers will call you and give their order. You will follow this script. You will ONLY ask 1 question at a time. Once you get an answer you will move on to the next one. \n 1) You will start your conversation by saying. "Hi! I am Rachel from Pizza Square. Am I speaking to Michael from 345 Palm Square?" All your responses should be as long as this one.) \n 2) You will ask the customer what they would like to order. Only ask the next question once you have a clear answer to this question. If the customer wants assistance, you will ask them about their preferences. \n 3) You will ask the customer for the required size - regular, medium, or large. If the customer has follow up questions, you will clarify, that the small is for 1 person and 4 dollars, the medium is for 2 people and 7 dollars and the large is for 4 people and 11 dollars. Only ask the next question once you have a clear answer to this question. \n 4) Once confirmed, you will ask whether the customer wants to pay the amount of for the chosen pizza via card or cash. \n 5) You will then tell the customer that their pizza is in the oven and will reach within 30 minutes. \n You have options of 3 pizzas. You will only give description if the user asks for it.  \n 1) The BBQ Chicken - Which has Chicken, Mushrooms, Olives in a tomato base 2) The Pesto Dominator - Which has Chicken, Bacon and Ham in a green base. This is the best-selling pizza. 3) The Veg Ultimate - Which has tofu, mushrooms, olives, tomatoes in a tomato base. '}], [{'role': 'system', 'content': 'You will always be very concise and brief with your responses. You will NEVER speak longer than 2 sentences. \nYou are Rachel, an order representative for Pizza Square. You will be very helpful and kind. Customers will call you and give their order. You will follow this script. You will ONLY ask 1 question at a time. Once you get an answer you will move on to the next one. \n 1) You will start your conversation by saying. "Hi! I am Rachel from Pizza Square. Am I speaking to Michael from 345 Palm Square?" All your responses should be as long as this one.) \n 2) You will ask the customer what they would like to order. Only ask the next question once you have a clear answer to this question. If the customer wants assistance, you will ask them about their preferences. \n 3) You will ask the customer for the required size - regular, medium, or large. If the customer has follow up questions, you will clarify, that the small is for 1 person and 4 dollars, the medium is for 2 people and 7 dollars and the large is for 4 people and 11 dollars. Only ask the next question once you have a clear answer to this question. \n 4) Once confirmed, you will ask whether the customer wants to pay the amount of for the chosen pizza via card or cash. \n 5) You will then tell the customer that their pizza is in the oven and will reach within 30 minutes. \n You have options of 3 pizzas. You will only give description if the user asks for it.  \n 1) The BBQ Chicken - Which has Chicken, Mushrooms, Olives in a tomato base 2) The Pesto Dominator - Which has Chicken, Bacon and Ham in a green base. This is the best-selling pizza. 3) The Veg Ultimate - Which has tofu, mushrooms, olives, tomatoes in a tomato base. '}]
2024-05-20 19:23:15 2024-05-20 13:53:15.078 INFO {task_manager} [__handle_initial_silence] Calee was silent and hence speaking Hello on callee's behalf
2024-05-20 19:23:16 2024-05-20 13:53:16.195 INFO {task_manager} [__check_for_completion] Only 8.959940671920776 seconds since last spoken time stamp and hence not cutting the phone call
2024-05-20 19:23:18 2024-05-20 13:53:18.203 INFO {task_manager} [__check_for_completion] 10.967700958251953 seconds since last spoken time stamp and hence cutting the phone call and last transmitted timestampt ws 1716213187.235612 and time since last spoken human word 0
2024-05-20 19:23:18 2024-05-20 13:53:18.203 INFO {task_manager} [__process_end_of_conversation] Got end of conversation. I'm stopping now
2024-05-20 19:23:18 2024-05-20 13:53:18.203 INFO {telephony} [stop_handler] stopping handler
2024-05-20 19:23:18 2024-05-20 13:53:18.203 INFO {telephony} [stop_handler] sleeping for 5 seconds so that whatever needs to pass is passed
2024-05-20 19:23:23 Traceback (most recent call last):
2024-05-20 19:23:23   File "/app/bolna/input_handlers/telephony.py", line 54, in _listen
2024-05-20 19:23:23     message = await self.websocket.receive_text()
2024-05-20 19:23:23   File "/usr/local/lib/python3.10/site-packages/starlette/websockets.py", line 109, in receive_text
2024-05-20 19:23:23     raise RuntimeError(
2024-05-20 19:23:23 RuntimeError: WebSocket is not connected. Need to call "accept" first.
2024-05-20 19:23:23 2024-05-20 13:53:23.237 ERROR {telephony} [_listen] Exception in twilio_receiver reading events: WebSocket is not connected. Need to call "accept" first.
2024-05-20 19:23:24 2024-05-20 13:53:24.508 INFO {telephony} [stop_handler] WebSocket connection closed
2024-05-20 19:23:24 2024-05-20 13:53:24.508 INFO {task_manager} [__process_end_of_conversation] Stopped input handler
2024-05-20 19:23:24 2024-05-20 13:53:24.508 INFO {task_manager} [__process_end_of_conversation] Stopping transcriber
2024-05-20 19:23:24 INFO:     connection closed
prateeksachan commented 3 months ago

@saksham003 can you try doing this remotely once in gitpod.io? I feel there's significant latency and hence, the call is being cut since nothing is getting transcribed.

saksham003 commented 3 months ago

@prateeksachan yeah you're right it's working fine on gitpot.io. What do you think is the issue on my local machine, why is it not working there?

prateeksachan commented 3 months ago

@saksham003 can you check if the requests from your laptop are reaching Deepgram servers? Try to eliminate any IP blacklisting etc. that might be happening.

saksham003 commented 3 months ago

Okay, I'll check that. Thanks @prateeksachan