Closed fclearner closed 2 months ago
Thanks for your interest in the work! Unfortunately, I don't think a single 16GB V100 will be enough to run training with the code as-is --- I believe training took up around 22-23 GB of memory with all the gradients and such. You might be able to make things more efficient by quantizing the audio encoder and/or quantizing the LLM beyond what is done currently (float16), but I'm not sure how that would impact the training or performance.
Unfortunately, I don't think a single 16GB V100 will be enough to run training with the code as-is --- I believe training took up around 22-23 GB of memory with all the gradients and such. You might be able to make things more efficient by quantizing the audio encoder and/or quantizing the LLM beyond what is done currently (float16), but I'm not sure how that would impact the training or performance.
Thank you for your response. I will attempt the quantization strategy as suggested. Additionally, I believe utilizing DeepSpeed might also provide some assistance in managing the memory requirements.
Hello Wonjune Kang,
I have read your article and am currently working on the code for the SpeechLLM model. It's truly an impressive piece of work!
I have a question regarding the hardware requirements: Is a V100 GPU with 16GB memory sufficient for training the model? I would greatly appreciate any suggestions you can provide.
Thank you for your help!