wonjune-kang / llm-speech-summarization

Prompting Large Language Models with Audio for General-Purpose Speech Summarization
https://arxiv.org/abs/2406.05968
MIT License
6 stars 1 forks source link

gpu memory problem #2

Closed fclearner closed 2 months ago

fclearner commented 2 months ago

Hello Wonjune Kang,

I have read your article and am currently working on the code for the SpeechLLM model. It's truly an impressive piece of work!

I have a question regarding the hardware requirements: Is a V100 GPU with 16GB memory sufficient for training the model? I would greatly appreciate any suggestions you can provide.

Thank you for your help!

wonjune-kang commented 2 months ago

Thanks for your interest in the work! Unfortunately, I don't think a single 16GB V100 will be enough to run training with the code as-is --- I believe training took up around 22-23 GB of memory with all the gradients and such. You might be able to make things more efficient by quantizing the audio encoder and/or quantizing the LLM beyond what is done currently (float16), but I'm not sure how that would impact the training or performance.

fclearner commented 2 months ago

Unfortunately, I don't think a single 16GB V100 will be enough to run training with the code as-is --- I believe training took up around 22-23 GB of memory with all the gradients and such. You might be able to make things more efficient by quantizing the audio encoder and/or quantizing the LLM beyond what is done currently (float16), but I'm not sure how that would impact the training or performance.

Thank you for your response. I will attempt the quantization strategy as suggested. Additionally, I believe utilizing DeepSpeed might also provide some assistance in managing the memory requirements.