wsippel / bark_tts

Oobabooga extension for Bark TTS
GNU Affero General Public License v3.0
110 stars 14 forks source link

VRAM usage is too high, use preload_models #1

Closed CarlKenner closed 1 year ago

CarlKenner commented 1 year ago

Currently, you have hardcoded it to load the large version of every model and load them all into GPU VRAM. Please either automatically detect sensible settings based on their remaining VRAM, or provide options in the GUI for small/large and for CPU/GPU for each of the four bark modules.

CarlKenner commented 1 year ago

Turns out the values passed in preload_models aren't respected, unless you specify the same sizes it defaults to when you call generate_audio. You need to set the environment variable SUNO_USE_SMALL_MODELS=True before importing bark, if you want to use the small low-vram versions of the models. You can then also specify some of them to be loaded into CPU with the preload_models function. I got it running all on CPU, but I'm going to do some more testing tomorrow to load some parts into GPU.

But even though I was using the small models, I was literally shocked by how good the quality was. I didn't even realise it was my AI assistant talking to me, and I wondered where the voice was coming from. It didn't sound like TTS.

wsippel commented 1 year ago

I'm currently working on saving the current settings to a config file, I'll just throw those options in there for now. It's probably not something users are gonna mess with all the time, anyway. If I come up with a solution that doesn't look too messy, I'll add it to the UI.

wsippel commented 1 year ago

Config file support has been added ages ago, I totally forgot to close this issue.