How can we specify a smaller batch size for GPU with 8GB memory or less?

mtx2d commented 1 year ago

Hi Team, Thanks for the great software. Is it possible to have batch size as a parameter?

I am trying to run the example with a NVIDIA GeForce GTX 1080. It is a rather old GPU so it is not as powerful. When running the example code, it always fail with the following error:

---------------------------------------------------------------------------
OutOfMemoryError                          Traceback (most recent call last)
Cell In[8], line 8
      2 from IPython.display import Audio
      4 text_prompt = """
      5      Hello, my name is Suno. And, uh — and I like pizza. [laughs] 
      6      But I also have other interests such as playing tic tac toe.
      7 """
----> 8 audio_array = generate_audio(text_prompt)
      9 Audio(audio_array, rate=SAMPLE_RATE)

File ~\workspace\bark\bark\api.py:77, in generate_audio(text, history_prompt, text_temp, waveform_temp)
     60 def generate_audio(
     61     text: str,
     62     history_prompt: Optional[str] = None,
     63     text_temp: float = 0.7,
     64     waveform_temp: float = 0.7,
     65 ):
     66     """Generate audio array from input text.
     67 
     68     Args:
   (...)
     75         numpy audio array at sample frequency 24khz
     76     """
---> 77     x_semantic = text_to_semantic(text, history_prompt=history_prompt, temp=text_temp)
     78     audio_arr = semantic_to_waveform(x_semantic, history_prompt=history_prompt, temp=waveform_temp)
     79     return audio_arr

File ~\workspace\bark\bark\api.py:23, in text_to_semantic(text, history_prompt, temp)
      8 def text_to_semantic(
      9     text: str,
     10     history_prompt: Optional[str] = None,
     11     temp: float = 0.7,
     12 ):
     13     """Generate semantic array from text.
     14 
     15     Args:
   (...)
     21         numpy semantic array to be fed into `semantic_to_waveform`
     22     """
---> 23     x_semantic = generate_text_semantic(
     24         text,
     25         history_prompt=history_prompt,
     26         temp=temp,
     27     )
     28     return x_semantic

File ~\workspace\bark\bark\generation.py:404, in generate_text_semantic(text, history_prompt, temp, top_k, top_p, use_gpu, silent, min_eos_p, max_gen_duration_s, allow_early_stop, model)
    402 tot_generated_duration_s = 0
    403 for n in range(n_tot_steps):
--> 404     logits = model(x, merge_context=True)
    405     relevant_logits = logits[0, 0, :SEMANTIC_VOCAB_SIZE]
    406     if allow_early_stop:

File ~\workspace\bark\venv\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~\workspace\bark\bark\model.py:168, in GPT.forward(self, idx, merge_context)
    166 x = self.transformer.drop(tok_emb + pos_emb)
    167 for block in self.transformer.h:
--> 168     x = block(x)
    169 x = self.transformer.ln_f(x)
    171 # inference-time mini-optimization: only forward the lm_head on the very last position

File ~\workspace\bark\venv\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~\workspace\bark\bark\model.py:100, in Block.forward(self, x)
     98 def forward(self, x):
     99     x = x + self.attn(self.ln_1(x))
--> 100     x = x + self.mlp(self.ln_2(x))
    101     return x

File ~\workspace\bark\venv\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~\workspace\bark\bark\model.py:82, in MLP.forward(self, x)
     81 def forward(self, x):
---> 82     x = self.c_fc(x)
     83     x = self.gelu(x)
     84     x = self.c_proj(x)

File ~\workspace\bark\venv\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~\workspace\bark\venv\lib\site-packages\torch\nn\modules\linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 7.33 GiB already allocated; 0 bytes free; 7.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

JonathanFly commented 1 year ago

Try https://github.com/JonathanFly/bark with --use_smaller_models should fit even in 6GB.

EwoutH commented 1 year ago

What are the memory / VRAM requirements? And is quantization possible?

It would be great if a table with memory requirements could be added to the Readme and/or Docs.

gkucsko commented 1 year ago

added another simple option using the env var SUNO_USE_SMALL_MODELS=True to get smaller models that will prob fit on an 8gb card. Qw haven't implemented quantization yet. As for requirements would love it if people confirm who have the relevant cards (since it also depends on eg bf16 support etc) but i believe the small models work on an 8gb card and the large models work on a 12gb card.

Crimsonfart commented 1 year ago

added another simple option using the env var SUNO_USE_SMALL_MODELS=True to get smaller models that will prob fit on an 8gb card. Qw haven't implemented quantization yet. As for requirements would love it if people confirm who have the relevant cards (since it also depends on eg bf16 support etc) but i believe the small models work on an 8gb card and the large models work on a 12gb card.

where and how do I add? SUNO_USE_SMALL_MODELS=True

mtx2d commented 1 year ago

added another simple option using the env var SUNO_USE_SMALL_MODELS=True to get smaller models that will prob fit on an 8gb card. Qw haven't implemented quantization yet. As for requirements would love it if people confirm who have the relevant cards (since it also depends on eg bf16 support etc) but i believe the small models work on an 8gb card and the large models work on a 12gb card.

Thanks setting this environment variable worked for me!

Steps I took: On Windows:

set SUNO_USE_SMALL_MODELS=True
jupyter lab

SeanDohertyPhotos commented 1 year ago

still getting the error

from bark import SAMPLE_RATE, generate_audio, preload_models
from IPython.display import Audio
import os
preload_models(use_gpu=False)
os.environ['SUNO_USE_SMALL_MODELS'] = 'True'

text_prompt = """
    Hello.
"""
audio_array = generate_audio(text_prompt)
Audio(audio_array, rate=SAMPLE_RATE)

CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 4.00 GiB total capacity; 3.46 GiB already allocated; 0 bytes free; 3.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  File "C:\Users\smast\OneDrive\Desktop\Code Projects\Johnny Five\audio test.py", line 12, in <module>
    audio_array = generate_audio(text_prompt)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 4.00 GiB total capacity; 3.46 GiB already allocated; 0 bytes free; 3.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

gkucsko commented 1 year ago

you have to set the environment variable before the model load. but also you can now more easily specify the model size in the preload function, see also here: https://github.com/suno-ai/bark/issues/51

CarlKenner commented 1 year ago

but also you can now more easily specify the model size in the preload function, see also here: #51

No, you can't. It's bugged. The model size you specify in the preload function isn't respected. generate_audio will reload the large models when you call it. I couldn't get out why I was getting CUDA out of memory errors when I specified small and CPU for all the models and CUDA usage should have been zero. lol.

gkucsko commented 1 year ago

oh yikes sorry, lemme check. feel free to also PR if you find the bug

gkucsko commented 1 year ago

works fine for me on a quick test, can anyone else confirm its borked?

CarlKenner commented 1 year ago

works fine for me on a quick test, can anyone else confirm it's borked?

The bug was in this line:

model_key = str(device) + f"__{model_type}"

It has since been fixed.

gkucsko commented 1 year ago

Ah ok great ya just made some fixes there

suno-ai / bark

How can we specify a smaller batch size for GPU with 8GB memory or less? #29