Closed dlion168 closed 11 months ago
I think the issue here is the music file you are using is too big for the inference to run on GPU. On a 32GB GPU the limit is 1 minute that I have tested.
If you want I can add another gradio demo that can handle larger music files by using a sliding window for inference.
Many thanks to your great work ! I try to inference the MU-LLama model on an A40 48G RAM GPU using the python script provided, but it turns out to get the following error.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.27 GiB (GPU 0; 44.35 GiB total capacity; 30.87 GiB already allocated; 13.06 GiB free; 30.94 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I tried to (1) set PYTORCH_CUDA_ALLOC_CONF="max_split_size_mb:128" (2) decrease the llama parameter of llama in llama_adaptor.py from max_batch_size=32 to 1 and decrease max_seq_len from 8192 to 256, but the error still came out.