serp-ai / bark-with-voice-clone

🔊 Text-prompted Generative Audio Model - With the ability to clone voices
https://serp.ai/tools/bark-text-to-speech-ai-voice-clone-app
Other
3.05k stars 407 forks source link

How much VRAM is needed to generate audio? #5

Open sashasubbbb opened 1 year ago

sashasubbbb commented 1 year ago

How much VRAM is needed to generate audio? I get CUDA OOM on final step of generation. Is there any option to run this on 8gb VRAM? Like here? https://github.com/suno-ai/bark/issues/29

limapedro commented 1 year ago

I have the same problem, and there's example on how to use the CPU to generate audio.

froger-me commented 1 year ago

I run it on 8GB VRAM with the following changes to generation.py:

I'm sure there are better ways to optimise but that was good enough to run the notebook without running into torch.cuda.OutOfMemoryError errors.

Crimsonfart commented 1 year ago

just great! This is exactly what people with 8GB vram GPUs need THANK YOU it works great!

troed commented 1 year ago

For information: With 12GB VRAM it seems no changes need to be made.

94awuna commented 1 year ago

@froger-me could you maybe upload or sent me a modified version of the generation.py? I am new to this and having a hard to understand where to insert the code you provided. Thanks for making this solution public!

froger-me commented 1 year ago

@94awuna things are moving fast and I believe it's already been patched in the main repo. I also saw merges from the original in this repo made today, so it might already include changes to that effect. The code above is by no means anything more than an attempt - I've provided the entire bit to replace in the file as proof of concept only. I do not know what version you currently use, or if my crude changes still work since latest changes.

If you still want to go with it, open generation.py, add import gc at the top, copy the block of code I included above, find def load_model(ckpt_path=None, use_gpu=True, force_reload=False, model_type="text"): in the file, select until return models[model_key], and paste the block to replace your selection. These are the only changes I made.

ricardojuerge735 commented 1 year ago

@froger-me I've used your code to run on my 8GB VRAM GPU, and it appears to have worked nicely. Thank you! Does the code have any impact on the final output 'quality'? I apologize, but I don't fully understand what's being done here; I'm just testing the technology.

froger-me commented 1 year ago

@ricardojuerge735 it won't impact the resulting audio (don't hold your breath if you want accurate voice cloning, my and many other people's attempt gave mixed to bad results, 8GB hack or not). It's just memory management could be done better than what I did, and it's the first time I edit python code, so I'm not very familiar with how to do it better.