shansongliu / MU-LLaMA

MU-LLaMA: Music Understanding Large Language Model
GNU General Public License v3.0
221 stars 16 forks source link

cannot inference on a google colab A100 40GB GPU #13

Closed feiyuehchen closed 11 months ago

feiyuehchen commented 11 months ago

Hi, I read #12 where the 32GB V100 could only inference <1min audio clips and I tried it with a 37 sec mp3 file (link)

But It didn't work and come out with an OOM error in the model.eval() step. Are there any solutions? In addition, are there any ways using multiple 24GB GPUs to inference?

Here's the colab link

Here's the error msg: Traceback (most recent call last): File "/content/drive/MyDrive/MU-LLaMA/MU-LLaMA/inference.py", line 41, in <module> model.eval() File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2307, in eval return self.train(False) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) [Previous line repeated 1 more time] File "/content/drive/MyDrive/MU-LLaMA/MU-LLaMA/llama/llama.py", line 154, in train ).cuda() torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 39.56 GiB total capacity; 36.15 GiB already allocated; 2.36 GiB free; 36.17 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Thanks!

shansongliu commented 11 months ago

Hi, I read #12 where the 32GB V100 could only inference <1min audio clips and I tried it with a 37 sec mp3 file (link)

But It didn't work and come out with an OOM error in the model.eval() step. Are there any solutions? In addition, are there any ways using multiple 24GB GPUs to inference?

Here's the colab link

Here's the error msg: Traceback (most recent call last): File "/content/drive/MyDrive/MU-LLaMA/MU-LLaMA/inference.py", line 41, in <module> model.eval() File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2307, in eval return self.train(False) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) [Previous line repeated 1 more time] File "/content/drive/MyDrive/MU-LLaMA/MU-LLaMA/llama/llama.py", line 154, in train ).cuda() torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 39.56 GiB total capacity; 36.15 GiB already allocated; 2.36 GiB free; 36.17 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Thanks!

Thanks for pointing out this problem. We will take a look and get back soon.

shansongliu commented 11 months ago

Hi, I read #12 where the 32GB V100 could only inference <1min audio clips and I tried it with a 37 sec mp3 file (link) But It didn't work and come out with an OOM error in the model.eval() step. Are there any solutions? In addition, are there any ways using multiple 24GB GPUs to inference? Here's the colab link Here's the error msg: Traceback (most recent call last): File "/content/drive/MyDrive/MU-LLaMA/MU-LLaMA/inference.py", line 41, in <module> model.eval() File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2307, in eval return self.train(False) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2288, in train module.train(mode) [Previous line repeated 1 more time] File "/content/drive/MyDrive/MU-LLaMA/MU-LLaMA/llama/llama.py", line 154, in train ).cuda() torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 39.56 GiB total capacity; 36.15 GiB already allocated; 2.36 GiB free; 36.17 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Thanks!

Thanks for pointing out this problem. We will take a look and get back soon.

The newly commit 68101f1 may solve your issue by changing max_batch_size=32 to max_batch_size=1. You can have a try. If any issue still exists, welcome to reply to this issue further. We will be appreciated if you starring our repo😊.