RuntimeError: CUDA out of memory.

bmc84 commented 2 years ago

Hi,

I'm raising an issue for "RuntimeError: CUDA out of memory" because, since upgrading to the latest version, this is what happens when using the exact same commands which have executed successfully in the previous version 😥

This error happens with any combination such as preset = standard / candidates = 3 preset = fast / candidates = 1

I've ensured I'm trying to run this under identical settings to when it did work (ie nothing else open, nothing hogging GPU RAM). I just can't get this to work any more since the upgrade, where it used to work on my RTX 3070 using the 'standard' setting in Windows 10.

Any assistance would be very much appreciated. The full error is below (this was with fast/1 candidate).

D:\tortoise\tortoise-tts\tortoise\utils\audio.py:14: WavFileWarning: Chunk (non-data) not understood, skipping it. sampling_rate, data = read(full_path) Generating autoregressive samples.. 100%|████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:23<00:00, 3.88s/it] Computing best candidates using CLVP and CVVP 0%| | 0/6 [00:00<?, ?it/s]d:\anaconda\envs\tort2\lib\site-packages\torch\utils\checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") 0%| | 0/6 [00:00<?, ?it/s] Traceback (most recent call last): File "D:\tortoise\tortoise-tts\tortoise\do_tts.py", line 30, in gen = tts.tts_with_preset(args.text, k=args.candidates, voice_samples=voice_samples, conditioning_latents=conditioning_latents, File "D:\tortoise\tortoise-tts\tortoise\api.py", line 289, in tts_with_preset return self.tts(text, kwargs) File "D:\tortoise\tortoise-tts\tortoise\api.py", line 393, in tts clvp = self.clvp(text_tokens.repeat(batch.shape[0], 1), batch, return_loss=False) File "d:\anaconda\envs\tort2\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, *kwargs) File "D:\tortoise\tortoise-tts\tortoise\models\clvp.py", line 121, in forward enc_speech = self.speech_transformer(speech_emb, mask=voice_mask) File "d:\anaconda\envs\tort2\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "D:\tortoise\tortoise-tts\tortoise\models\arch_util.py", line 364, in forward h = self.transformer(x, kwargs) File "d:\anaconda\envs\tort2\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "D:\tortoise\tortoise-tts\tortoise\models\xtransformers.py", line 1237, in forward x, intermediates = self.attn_layers(x, mask=mask, mems=mems, return_hiddens=True, kwargs) File "d:\anaconda\envs\tort2\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "D:\tortoise\tortoise-tts\tortoise\models\xtransformers.py", line 972, in forward out, inter, k, v = checkpoint(block, x, None, mask, None, attn_mask, self.pia_pos_emb, rotary_pos_emb, File "d:\anaconda\envs\tort2\lib\site-packages\torch\utils\checkpoint.py", line 235, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "d:\anaconda\envs\tort2\lib\site-packages\torch\utils\checkpoint.py", line 96, in forward outputs = run_function(args) File "d:\anaconda\envs\tort2\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, kwargs) File "D:\tortoise\tortoise-tts\tortoise\models\arch_util.py", line 341, in forward return torch.utils.checkpoint.checkpoint(partial, x, args) File "d:\anaconda\envs\tort2\lib\site-packages\torch\utils\checkpoint.py", line 235, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "d:\anaconda\envs\tort2\lib\site-packages\torch\utils\checkpoint.py", line 96, in forward outputs = run_function(args) File "d:\anaconda\envs\tort2\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "D:\tortoise\tortoise-tts\tortoise\models\xtransformers.py", line 709, in forward post_softmax_attn = attn.clone() RuntimeError: CUDA out of memory. Tried to allocate 184.00 MiB (GPU 0; 8.00 GiB total capacity; 5.32 GiB already allocated; 0 bytes free; 5.46 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

neonbjb commented 2 years ago

Hey, the correct fix is to reduce the batch size. I've gotten enough questions about this that I've implemented an automatic batch size adjustment mechanism that works based on your available GPU memory. Please pull the latest version from the "main" branch and give it a try.

bmc84 commented 2 years ago

Thanks for the quick response & fix to get it working again.

It's working now, but there's a very large performance hit. What's the cause of this? Previously a sentence would generate in <3 minutes (using whatever the default batch was previously, I never changed it) and "standard". Now the exact same sentence + voice, using standard & 1 candidate, is taking around 10 minutes. Is this performance increase expected with whatever has changed.. ?

neonbjb commented 2 years ago

I expect a small performance hit, like 10%, but nothing like that. It does not line up with what i am seeing but this might be caused by your computer having to hit the gc too much and thrashing.

Rendering time is highly dependent on the length of text you provide. Are you just feeding in a longer prompt? That might also explain why you saw the crash suddenly.

If you have some time and want to help me with this, can you pull the v2.2 release and clock the exact rendering time for the same phrase with both releases and post it here?

neonbjb / tortoise-tts

RuntimeError: CUDA out of memory. #43