cannot inference on a google colab A100 40GB GPU

Hi, I read #12 where the 32GB V100 could only inference <1min audio clips and I tried it with a 37 sec mp3 file (link)

But It didn't work and come out with an OOM error in the model.eval() step. Are there any solutions? In addition, are there any ways using multiple 24GB GPUs to inference?

Here's the colab link

Here's the error msg: Traceback (most recent call last): File "/content/drive/MyDrive/MU-LLaMA/MU-LLaMA/inference.py", line 41, in <module> model.eval() File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2307, in eval return self.train(False) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) [Previous line repeated 1 more time] File "/content/drive/MyDrive/MU-LLaMA/MU-LLaMA/llama/llama.py", line 154, in train ).cuda() torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 39.56 GiB total capacity; 36.15 GiB already allocated; 2.36 GiB free; 36.17 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Thanks!

Hi, I read #12 where the 32GB V100 could only inference <1min audio clips and I tried it with a 37 sec mp3 file (link)

But It didn't work and come out with an OOM error in the model.eval() step. Are there any solutions? In addition, are there any ways using multiple 24GB GPUs to inference?

Here's the colab link

Here's the error msg: Traceback (most recent call last): File "/content/drive/MyDrive/MU-LLaMA/MU-LLaMA/inference.py", line 41, in <module> model.eval() File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2307, in eval return self.train(False) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) [Previous line repeated 1 more time] File "/content/drive/MyDrive/MU-LLaMA/MU-LLaMA/llama/llama.py", line 154, in train ).cuda() torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 39.56 GiB total capacity; 36.15 GiB already allocated; 2.36 GiB free; 36.17 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Thanks!

Thanks for pointing out this problem. We will take a look and get back soon.

Hi, I read #12 where the 32GB V100 could only inference <1min audio clips and I tried it with a 37 sec mp3 file (link) But It didn't work and come out with an OOM error in the model.eval() step. Are there any solutions? In addition, are there any ways using multiple 24GB GPUs to inference? Here's the colab link Here's the error msg: Traceback (most recent call last): File "/content/drive/MyDrive/MU-LLaMA/MU-LLaMA/inference.py", line 41, in <module> model.eval() File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2307, in eval return self.train(False) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) [Previous line repeated 1 more time] File "/content/drive/MyDrive/MU-LLaMA/MU-LLaMA/llama/llama.py", line 154, in train ).cuda() torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 39.56 GiB total capacity; 36.15 GiB already allocated; 2.36 GiB free; 36.17 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Thanks!

Thanks for pointing out this problem. We will take a look and get back soon.

The newly commit 68101f1 may solve your issue by changing max_batch_size=32 to max_batch_size=1. You can have a try. If any issue still exists, welcome to reply to this issue further. We will be appreciated if you starring our repo😊.

shansongliu / MU-LLaMA

cannot inference on a google colab A100 40GB GPU #13