BGM Separation not working with YouTube.

dng-nguyn commented 2 weeks ago

Which OS are you using? Arch Linux On enabling BGM Separation Filter with any models, the temporary youtube audio files is not recognized and the transcription fails to continue.

Error transcribing file: Error opening '/root/Whisper-WebUI/modules/yt_tmp.wav': Format not recognised.
Traceback (most recent call last):
  File "/root/Whisper-WebUI/venv/lib/python3.12/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/Whisper-WebUI/venv/lib/python3.12/site-packages/gradio/route_utils.py", line 321, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/Whisper-WebUI/venv/lib/python3.12/site-packages/gradio/blocks.py", line 1945, in process_api
    data = await self.postprocess_data(block_fn, result["prediction"], state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/Whisper-WebUI/venv/lib/python3.12/site-packages/gradio/blocks.py", line 1717, in postprocess_data
    self.validate_outputs(block_fn, predictions)  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/Whisper-WebUI/venv/lib/python3.12/site-packages/gradio/blocks.py", line 1691, in validate_outputs
    raise ValueError(
ValueError: An event handler (transcribe_youtube) didn't receive enough output values (needed: 2, received: 1).
Wanted outputs:
    [<gradio.components.textbox.Textbox object at 0x766bd457d4f0>, <gradio.templates.Files object at 0x766bd457d3d0>]
Received outputs:
    [None]

jhj0517 commented 2 weeks ago

Hi, Thanks for bringing this up.

Somehow the downloaded Youtube audio file is corrupted, maybe caused by the recent patch of the Youtube API. So I fixed it in #305.

If you are still experiencing the same error, please let me know.

dng-nguyn commented 2 weeks ago

Hello! After taking a closer look, it seems that the issue stemmed from the file was being saved as .wav file. The issue only occurred when BGM Separation was used and not normally, and this may have confused the model, as .wav file container with AAC is non-standard.

The fix worked because ffmpeg converted AAC into pcm_s16le, which was the default for muxing into .wav files and is standard. But this adds additional overhead and not needed.

A more simpler fix, eliminating the need for converting and the additional overhead would be saving the file as .mp4:

def get_ytaudio(ytdata: YouTube):
    return ytdata.streams.get_audio_only().download(filename=os.path.join("modules", "yt_tmp.mp4"))

dng-nguyn commented 2 weeks ago

Never mind, it seems that UVR converts into raw .wav audio anyways. The initial fix doesn't add additional overhead.

Closing as resolved. Thanks!

jhj0517 / Whisper-WebUI

BGM Separation not working with YouTube. #304