voxos-ai / bolna

End-to-end platform for building voice first multimodal agents
MIT License
393 stars 112 forks source link

Error with ffmpeg #317

Open iowathe3rd opened 3 months ago

iowathe3rd commented 3 months ago

When working with elevenlabs a problem pops up

bolna-app-1   | 2024-07-05 11:54:48.037 INFO {task_manager} [_synthesize] ##### sending text to elevenlabs for generation: Welcome!
bolna-app-1   | 2024-07-05 11:54:48.037 INFO {elevenlabs_synthesizer} [push] Pushed message to internal queue {'data': 'Welcome!', 'meta_info': {'io': 'twilio', 'message_category': 'agent_welcome_message', 'stream_sid': 'MZd6a2a5714938c29b35741cfa4863301a', 'request_id': '5c32a901-a2ea-4ce0-9e16-5f7a5e6e8cf4', 'cached': False, 'sequence_id': -1, 'format': 'pcm', 'text': 'Welcome!', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1720180488.0376647}}
bolna-app-1   | Traceback (most recent call last):
bolna-app-1   |   File "/usr/local/lib/python3.10/site-packages/bolna/agent_manager/task_manager.py", line 1345, in __send_preprocessed_audio
bolna-app-1   |     number_of_chunks = math.ceil(len(audio_chunk)/self.output_chunk_size)
bolna-app-1   | TypeError: object of type 'NoneType' has no len()
bolna-app-1   | 2024-07-05 11:54:48.038 ERROR {task_manager} [__send_preprocessed_audio] Something went wrong object of type 'NoneType' has no len()
bolna-app-1   | 2024-07-05 11:54:48.038 INFO {utils} [write_request_logs] Message {'direction': 'request', 'data': 'Welcome!', 'leg_id': '5c32a901-a2ea-4ce0-9e16-5f7a5e6e8cf4', 'time': '2024-07-05 11:54:48', 'component': 'synthesizer', 'sequence_id': -1, 'model': 'elevenlabs', 'cached': False, 'latency': None, 'is_final': False, 'engine': 'eleven_turbo_v2'}
bolna-app-1   | 2024-07-05 11:54:48.041 INFO {utils} [write_request_logs] Message {'direction': 'response', 'data': 'Welcome!', 'leg_id': '5c32a901-a2ea-4ce0-9e16-5f7a5e6e8cf4', 'time': '2024-07-05 11:54:48', 'component': 'synthesizer', 'sequence_id': -1, 'model': 'elevenlabs', 'cached': True, 'latency': None, 'is_final': False, 'engine': 'eleven_turbo_v2'}
bolna-app-1   | 2024-07-05 11:54:48.041 INFO {utils} [write_request_logs] Message {'direction': 'request', 'data': 'Welcome!', 'leg_id': '5c32a901-a2ea-4ce0-9e16-5f7a5e6e8cf4', 'time': '2024-07-05 11:54:48', 'component': 'synthesizer', 'sequence_id': -1, 'model': 'elevenlabs', 'cached': False, 'latency': None, 'is_final': False, 'engine': 'eleven_turbo_v2'}
bolna-app-1   | 2024-07-05 11:54:48.042 INFO {elevenlabs_synthesizer} [generate] Generating TTS response for message: {'data': 'Welcome!', 'meta_info': {'io': 'twilio', 'message_category': 'agent_welcome_message', 'stream_sid': 'MZd6a2a5714938c29b35741cfa4863301a', 'request_id': '5c32a901-a2ea-4ce0-9e16-5f7a5e6e8cf4', 'cached': False, 'sequence_id': -1, 'format': 'pcm', 'text': 'Welcome!', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1720180488.0376647}}, using mulaw False
bolna-app-1   | 2024-07-05 11:54:48.042 INFO {inmemory_scalar_cache} [get] Cache miss for key Welcome!
bolna-app-1   | 2024-07-05 11:54:48.042 INFO {elevenlabs_synthesizer} [generate] Not a cache hit [] and hence increasing characters by 8
bolna-app-1   | 2024-07-05 11:54:48.042 INFO {elevenlabs_synthesizer} [__generate_http] text Welcome!
bolna-app-1   | 2024-07-05 11:54:48.847 ERROR {elevenlabs_synthesizer} [__send_payload] Error: 400 - {"detail":{"status":"voice_not_found","message":"A voice for the voice_id TTa58Hl9lmhnQEvhp1WM was not found."}}
bolna-app-1   | 2024-07-05 11:54:48.847 INFO {utils} [convert_audio_to_wav] CONVERTING AUDIO TO WAV mp3
bolna-app-1   | Traceback (most recent call last):
bolna-app-1   |   File "/usr/local/lib/python3.10/site-packages/bolna/synthesizer/elevenlabs_synthesizer.py", line 228, in generate
bolna-app-1   |     wav_bytes = convert_audio_to_wav(audio, source_format="mp3")
bolna-app-1   |   File "/usr/local/lib/python3.10/site-packages/bolna/helpers/utils.py", line 354, in convert_audio_to_wav
bolna-app-1   |     audio = AudioSegment.from_file(io.BytesIO(audio_bytes), format=source_format)
bolna-app-1   |   File "/usr/local/lib/python3.10/site-packages/pydub/audio_segment.py", line 773, in from_file
bolna-app-1   |     raise CouldntDecodeError(
bolna-app-1   | pydub.exceptions.CouldntDecodeError: Decoding failed. ffmpeg returned error code: 1
bolna-app-1   |
bolna-app-1   | Output from ffmpeg/avlib:
bolna-app-1   |
bolna-app-1   | ffmpeg version 5.1.5-0+deb12u1 Copyright (c) 2000-2024 the FFmpeg developers
bolna-app-1   |   built with gcc 12 (Debian 12.2.0-14)
bolna-app-1   |   configuration: --prefix=/usr --extra-version=0+deb12u1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared
bolna-app-1   |   libavutil      57. 28.100 / 57. 28.100
bolna-app-1   |   libavcodec     59. 37.100 / 59. 37.100
bolna-app-1   |   libavformat    59. 27.100 / 59. 27.100
bolna-app-1   |   libavdevice    59.  7.100 / 59.  7.100
bolna-app-1   |   libavfilter     8. 44.100 /  8. 44.100
bolna-app-1   |   libswscale      6.  7.100 /  6.  7.100
bolna-app-1   |   libswresample   4.  7.100 /  4.  7.100
bolna-app-1   |   libpostproc    56.  6.100 / 56.  6.100
bolna-app-1   | [cache @ 0x55d68c0e1240] Inner protocol failed to seekback end : -38
bolna-app-1   |     Last message repeated 1 times
bolna-app-1   | [mp3 @ 0x55d68c0e0a40] Failed to read frame size: Could not seek to 1026.
bolna-app-1   | [cache @ 0x55d68c0e1240] Statistics, cache hits:0 cache misses:0
bolna-app-1   | cache:pipe:0: Invalid argument
bolna-app-1   |
bolna-app-1   | 2024-07-05 11:54:49.203 ERROR {elevenlabs_synthesizer} [generate] Error in eleven labs generate Decoding failed. ffmpeg returned error code: 1
jhui323444 commented 2 months ago

Hey so I also had this problem. FFmpeg wasnt the issue.

I realized that <"message":"A voice for the voice_id TTa58Hl9lmhnQEvhp1WM was not found."> meant that the voice id and voice provided in example agent payload did not exist. So just replacing those two with one you can find from elevenlabs makes it work