rsxdalv / tts-generation-webui

TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)
https://rsxdalv.github.io/tts-generation-webui/
MIT License
1.46k stars 160 forks source link

Any plan to add the new musicgen-stereo models? #248

Closed MrMondragon closed 5 months ago

MrMondragon commented 5 months ago

Any plan to add the new musicgen-stereo models?

rsxdalv commented 5 months ago

Thanks for bringing it up! It seems promising but I am a little worried about pytorch. They switched to 2.1 but most projects here (Bark Tortoise etc) still use 2.0, I'm scared it might break things, but I will give it a shot.

rsxdalv commented 5 months ago

Ok, I think it can be done, though not plug and play:

  File "C:\Users\admin\Desktop\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\gradio\routes.py", line 437, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\admin\Desktop\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\gradio\blocks.py", line 1352, in process_api
    result = await self.call_function(
  File "C:\Users\admin\Desktop\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\gradio\blocks.py", line 1077, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\Users\admin\Desktop\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Users\admin\Desktop\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "C:\Users\admin\Desktop\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "C:\Users\admin\Desktop\one-click-installers-tts-6.0\tts-generation-webui\src\musicgen\musicgen_tab.py", line 218, in generate
    filename, plot, _metadata = save_generation(
  File "C:\Users\admin\Desktop\one-click-installers-tts-6.0\tts-generation-webui\src\musicgen\musicgen_tab.py", line 91, in save_generation
    write_wav(filename, SAMPLE_RATE, audio_array)
  File "C:\Users\admin\Desktop\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\scipy\io\wavfile.py", line 796, in write
    fmt_chunk_data = struct.pack('<HHIIHH', format_tag, channels, fs,
struct.error: ushort format requires 0 <= number <= 0xffff
rsxdalv commented 5 months ago

Added in newest update. Let me know if there are any issues or needs.