lucataco / cog-whisperspeech

Cog wrapper for collabora/WhisperSpeech
https://replicate.com/lucataco/whisperspeech-small
25 stars 4 forks source link

Failed to load audio #1

Closed zeke closed 10 months ago

zeke commented 10 months ago

Is M4A a supported format for the speaker input?

I tried it with https://upcdn.io/FW25b4F/raw/zeke-scraps/code-snippets.m4a and I got this error:

Prediction failed.

Failed to load audio from https://upcdn.io/FW25b4F/raw/zeke-scraps/code-snippets.m4a

https://replicate.com/p/2p4hbztb3xtdd2vo6j6np3s3kq

Logs:

Traceback (most recent call last):
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/cog/server/worker.py", line 217, in _predict
result = predict(**payload)
^^^^^^^^^^^^^^^^^^
File "/src/predict.py", line 31, in predict
self.pipe.generate_to_file(output_path, prompt, lang=language, speaker=speaker)
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/whisperspeech/pipeline.py", line 90, in generate_to_file
self.vocoder.decode_to_file(fname, self.generate_atoks(text, speaker, lang=lang, cps=cps, step_callback=None))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/whisperspeech/pipeline.py", line 80, in generate_atoks
elif isinstance(speaker, (str, Path)): speaker = self.extract_spk_emb(speaker)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/whisperspeech/pipeline.py", line 73, in extract_spk_emb
samples, sr = torchaudio.load(fname)
^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torchaudio/backend/sox_io_backend.py", line 256, in load
return _fallback_load(filepath, frame_offset, num_frames, normalize, channels_first, format)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torchaudio/backend/sox_io_backend.py", line 30, in _fail_load
raise RuntimeError("Failed to load audio from {}".format(filepath))
RuntimeError: Failed to load audio from https://upcdn.io/FW25b4F/raw/zeke-scraps/code-snippets.m4a

cc @lucataco

platform-kit commented 10 months ago

I'm getting the same error with statically hosted MP3 inputs. Are other formats supported?

lucataco commented 10 months ago

Looks like its just ogg file formats for now? Inference example

zeke commented 10 months ago

Opened a PR updating the input description to give people a heads-up: https://github.com/lucataco/cog-whisperspeech/pull/2