rsxdalv / tts-generation-webui

TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)
https://rsxdalv.github.io/tts-generation-webui/
MIT License
1.67k stars 179 forks source link

Not able to generate - tried Bark, Tortoise, and MusicGen #268

Closed The1Bill closed 7 months ago

The1Bill commented 8 months ago

I tried the one-click installer (log attached) and it completes. I can open the Gradio window, but when I try to generate anything, I get this error:

gradio api handler musicgen {"text":"Deep house, 180BPM","model":"Small","duration":28,"topk":250,"topp":0,"temperature":1,"cfg_coef":3,"seed":-1,"use_multi_band_diffusion":false,"melody":null} TypeError: Cannot read properties of undefined (reading 'types') at Object.predict (/home/gptj6b/TTS/tts-generation-webui/react-ui/.next/server/chunks/327.js:388:24) at Object.musicgen (/home/gptj6b/TTS/tts-generation-webui/react-ui/.next/server/pages/api/gradio/[name].js:187:30) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async handler (/home/gptj6b/TTS/tts-generation-webui/react-ui/.next/server/pages/api/gradio/[name].js:148:20)

I think this has something to do with the following group of errors in the install log: Blocksparse is not available: the current GPU does not expose Tensor cores Failed to load voice clone demo module 'torch' has no attribute 'compiler'

I'm running 2X Quadro P6000 GPUs, so I don't have Tensor cores to expose. I'm running Cuda 12.1, and torch.cuda.is_available() returns TRUE.

Any other information that I should include?

Thanks!

Install Log.txt

rsxdalv commented 8 months ago

Hi, thank you for the report and the log. I'd like to ask - is there a reason you are running it inside ttsgen environment? To add to that, I remember that some users did see an issue if they tried to launch start which creates another nested conda environment, but it's not guaranteed.

As for the other errors - could you please test if the Gradio UI works at all? The error suggests that it's malfunctioning, but it might work and help debugging. Next, torch compile problem is something new (I'm afraid somebody somewhere "updated and improved" some project).

rsxdalv commented 8 months ago

Ah, and the problem is simple, the react UI tries to connect to 7860 which doesn't have the endpoints that it should be finding on 7861. Please test 7860 and then we can make the second Gradio interface work together.

The1Bill commented 8 months ago

I try to keep my dependencies contained in virtual environments as I run several applications in this VM with their own dependencies. I get the same result if I run the one-click installer out of the virtual environment, though.

Is there any way to pick a different port other than 7860? TextGen Webui and Automatic1111 are already jockeying for ports 7860 and 7861. In the meantime I'll try shutting down TextGen WebUI and Automatic1111 and seeing if I can get TTS to run.

Thanks for the replies - I'll let you know what the results are.

rsxdalv commented 8 months ago

Yes, part one - edit the settings in Gradio UI or config.json then restart. Part two - changing the React UI endpoint. It should be possible by setting the environment variable before launching the UI, but I'm not 100% sure it will pass the environment variable. GRADIO_BACKEND=http://127.0.0.1:4200/

rsxdalv commented 8 months ago

And if it works in a nested conda environment, that's good news.

The1Bill commented 8 months ago

I'm running the start_linux.sh file outside of a Conda environment; I didn't know that the one-click installer made its own Conda environment when I set it up initially.

I'm up and running; I've been able to pick a different port for Gradio, and there's no conflict on 3000.

One thing that I've noticed is that It seems like the MusicGen models aren't being unloaded, like, ever. I just tried loading the MusicGen-Medium, then the MusicGen-Small, and then the MusicGen-Medium again, and my VRAM usage kept ramping up (image attached). wonky VRAM usage musicgen

Lastly - how can I shut this down gracefully? Whenever I ctrl+c in terminal or use the "Apply settings and shutdown UI (Manual Restart Required)" button in the settings tab of the UI, it seems to not release port 3000, so when I try to restart I get the below error. The only fix I've found is to restart the whole VM.

(base) gptj6b@huggingface:~/tts$ ./start_linux.sh Loading extensions: Loaded extension: callback_save_generation_musicgen_ffmpeg Loaded extension: empty_extension Loaded extension: callback_save_generation_ffmpeg Loaded 2 callback_save_generation extensions. Loaded 1 callback_save_generation_musicgen extensions. Blocksparse is not available: the current GPU does not expose Tensor cores Failed to load voice clone demo module 'torch' has no attribute 'compiler' /home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( Starting Gradio server... Gradio interface options: inline: False inbrowser: True share: False debug: False max_threads: 40 auth: None auth_message: None prevent_thread_lock: False show_error: False server_name: 0.0.0.0 server_port: None show_tips: False height: 500 width: 100% favicon_path: None ssl_keyfile: None ssl_certfile: None ssl_keyfile_password: None ssl_verify: True quiet: True show_api: True file_directories: None _frontend: True Running on local URL: http://0.0.0.0:7860

tts-generation-webui-react@0.1.0 start next start

The1Bill commented 8 months ago

Actually, I spoke too soon. I'm now getting the below error whenever I try to run MusicGen. I rebooted the VM and am still getting this error.

Loading model facebook/musicgen-large Traceback (most recent call last): File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/gradio/queueing.py", line 407, in call_prediction output = await route_utils.call_process_api( File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/gradio/route_utils.py", line 226, in call_process_api output = await app.get_blocks().process_api( File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/gradio/blocks.py", line 1550, in process_api result = await self.call_function( File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/gradio/blocks.py", line 1185, in call_function prediction = await anyio.to_thread.run_sync( File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, args) File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/gradio/utils.py", line 661, in wrapper response = f(args, **kwargs) File "/home/gptj6b/tts/tts-generation-webui/src/musicgen/musicgen_tab.py", line 148, in generate MODEL = load_model(model) File "/home/gptj6b/tts/tts-generation-webui/src/musicgen/musicgen_tab.py", line 127, in load_model return MusicGen.get_pretrained(version) File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/audiocraft/models/musicgen.py", line 91, in get_pretrained return MusicGen(name, compression_model, lm) File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/audiocraft/models/musicgen.py", line 52, in init super().init(name, compression_model, lm, max_duration) File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/audiocraft/models/genmodel.py", line 55, in init self.compression_model = get_wrapped_compression_model(self.compression_model, self.cfg) File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/audiocraft/models/builders.py", line 254, in get_wrapped_compression_model if cfg.interleave_stereo_codebooks.use: AttributeError: 'NoneType' object has no attribute 'use'

rsxdalv commented 8 months ago

It seems like it doesn't clear the memory for big GPUs correctly. For small VRAM it does. Once I'll add the garbage collector code I'd like you to test again.

Also, for me I can just close with multiple interrupts with Ctrl C. Node.js is spawned as a subprocess. I could add a shutdown button although the button you found is meant to be a full shutdown.

Musicgen seems to have updated that, I'll try to fix it but without my workstation I'm not sure I'll be able to test properly.

The1Bill commented 8 months ago

I ran an update and was able to use MusicGen again after I restarted the WebUI. I'm noticing something else that's a bit hinky - there's an increase in VRAM usage as the inference stops.

I'm going to try running inference with this model with a barebones Python script. I've spent most of my time with LLMs, so I don't know a lot about how MusicGen works under the hood. I'm curious if this isn't just the way that this model behaves.

I'm happy to help with bugfinding/bugfixing/testing/whatever else I can do. My hardware may be vintage, but at least it was at the top of its game when it was new. ;)

rsxdalv commented 8 months ago

No no, it's not a hardware age issue, I'm thinking it's more of the "RAM" principle where you give Windows 2 GBs or 16 GBs but it's full either way. With GPUs this same behavior is a lot more problematic. Either that or Audiocraft/Musicgen made an update that broke memory clearing.

Good to hear that it worked. Now as for the memory spike - these models "always" do more. Audio models often generate a "compressed" version before decompressing. Things like encodec would have a small memory footprint but MultibandDiffusion is a heavy duty "decompresser". (Some models, like bark, also generate some sort of semantic version first where they convert the input text into "meaning" for the model and then use that to generate audio).

On Sat, Jan 27, 2024, 9:25 AM The1Bill @.***> wrote:

I ran an update and was able to use MusicGen again after I restarted the WebUI. I'm noticing something else that's a bit hinky - there's an increase in VRAM usage as the inference stops.

I'm going to try running inference with this model with a barebones Python script. I've spent most of my time with LLMs, so I don't know a lot about how MusicGen works under the hood. I'm curious if this isn't just the way that this model behaves.

I'm happy to help with bugfinding/bugfixing/testing/whatever else I can do. My hardware may be vintage, but at least it was at the top of its game when it was new. ;)

— Reply to this email directly, view it on GitHub https://github.com/rsxdalv/tts-generation-webui/issues/268#issuecomment-1912898149, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTRXI6FQ7G5QXFIK4IIJG3YQRJQXAVCNFSM6AAAAABCK7ISLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJSHA4TQMJUHE . You are receiving this because you commented.Message ID: @.***>

The1Bill commented 8 months ago

Is there any way to get it to use both GPUs? I took a quick look at the FB Research Audiocraft repo and didn't find anybody who had been able to do anything with multi-GPU setups (aside from somebody saying that it might work with SLURM, but that wouldn't really help me), but I was curious if you had found anything to the contrary.

rsxdalv commented 8 months ago

What about running two instances of the Webui? As you mentioned, GPU parallelism isn't a very popular topic.

On Sun, Jan 28, 2024, 1:42 PM The1Bill @.***> wrote:

Is there any way to get it to use both GPUs? I took a quick look at the FB Research Audiocraft repo and didn't find anybody who had been able to do anything with multi-GPU setups (aside from somebody saying that it might work with SLURM, but that wouldn't really help me), but I was curious if you had found anything to the contrary.

— Reply to this email directly, view it on GitHub https://github.com/rsxdalv/tts-generation-webui/issues/268#issuecomment-1913461397, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTRXI7QQOXLM744RVFVABDYQXQL5AVCNFSM6AAAAABCK7ISLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGQ3DCMZZG4 . You are receiving this because you commented.Message ID: @.***>

The1Bill commented 8 months ago

It's more that I'm trying to find a way to make longer clips without going OOM; I'm currently experimenting with running the model directly so I can understand its VRAM usage a bit better (things like how VRAM usage scales with clip length) so I have less of a skill gap.

rsxdalv commented 8 months ago

Ah, VRAM. Well if there's something I can update for integration, please let me know.

On Sun, Jan 28, 2024, 3:09 PM The1Bill @.***> wrote:

It's more that I'm trying to find a way to make longer clips without going OOM; I'm currently experimenting with running the model directly so I can understand its VRAM usage a bit better (things like how VRAM usage scales with clip length) so I have less of a skill gap.

— Reply to this email directly, view it on GitHub https://github.com/rsxdalv/tts-generation-webui/issues/268#issuecomment-1913481236, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTRXI5WFCXBHBOMFUY6FSTYQX2RRAVCNFSM6AAAAABCK7ISLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGQ4DCMRTGY . You are receiving this because you commented.Message ID: @.***>

The1Bill commented 8 months ago

Running the models "raw", I find that the VRAM usage is about the same. I didn't realise how catastrophic of an impact MultiBand Diffusion would have on VRAM usage.

Models definitely aren't unloading from VRAM after inference (or beginning inference with another model), though I see the benefit of leaving a model in memory so another prompt can be inferred straightway without reloading the model every time though.

Thanks for the explanation on why these models have a deceptively large footprint - I'm just going to abandon MBD for the time being as it doesn't play nicely with even the medium sized models.

As I suspected, the issues was on my end; I didn't have a lot of knowledge how these audio models worked. Now that I know, I think the only thing I'd change would be unload buttons on all of the models (basically what enhancement request 162 covers).

rsxdalv commented 8 months ago

Thanks for the deep feedback. Yes, haha mbd is like golden HDMI cable - I'm sure it does something but why should you do that. Actually, if you want to ask for something - I think mbd can be run after the generation. So (assuming devolpment time didn't exist) you could generate with regular method, then choose to convert the generation with MBD. Also, if this was an app you could offload the MBD to another GPU (assuming development time didn't exist).

Let's keep this issue open a little more, I'd like to convert this conversation into something for the UI since as you've experienced, this isn't what you expect. Even adding a "(MBD is heavy)" label could be a big improvement.

rsxdalv commented 7 months ago

Ok, I changed it to just: "Use Multi-Band Diffusion (High VRAM Usage)"