shansongliu / MU-LLaMA

MU-LLaMA: Music Understanding Large Language Model
GNU General Public License v3.0
221 stars 16 forks source link

Cannot inference on a A40 GPU with CUDA OOM error #12

Closed dlion168 closed 11 months ago

dlion168 commented 11 months ago

Many thanks to your great work ! I try to inference the MU-LLama model on an A40 48G RAM GPU using the python script provided, but it turns out to get the following error. torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.27 GiB (GPU 0; 44.35 GiB total capacity; 30.87 GiB already allocated; 13.06 GiB free; 30.94 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF I tried to (1) set PYTORCH_CUDA_ALLOC_CONF="max_split_size_mb:128" (2) decrease the llama parameter of llama in llama_adaptor.py from max_batch_size=32 to 1 and decrease max_seq_len from 8192 to 256, but the error still came out.

crypto-code commented 11 months ago

I think the issue here is the music file you are using is too big for the inference to run on GPU. On a 32GB GPU the limit is 1 minute that I have tested.

If you want I can add another gradio demo that can handle larger music files by using a sliding window for inference.