Open Devloper-RG opened 2 weeks ago
Hey @Devloper-RG, thanks for raising this issue and testing the lib in a multi-GPU setup 🙏 I'd be glad to help on that, can you provide a reproducer?
I guess the issue here is that we are pushing to cuda
as a device.
Hey @eustlb , thanks for getting back to me!
I made some modifications to the code to use the meta-llama/Meta-Llama-3.1-8B-Instruct
model by updating the arguments_classes/language_model_arguments.py
script. Also adjusted the LLM/language_model.py
script to allow the model to be accessed via Hugging Face.
Thereafter I ran the server on a Google Cloud Platform (GCP) VM with 2 NVIDIA T4 GPUs. During testing, I noticed that one of the GPUs consistently overloads, leading to a torch.OutOfMemoryError
.
I tried using the DataParallel method, but it didn’t resolve the issue. I also attempted to run the model in lower precision, which worked on a single GPU, but I’d like to use higher precision models and fully leverage multiple GPUs for better performance.
Any help with getting multi-GPU support working would be greatly appreciated!
While Using other models like meta-llama/Meta-Llama-3.1-8B-Instruct I'm encountering a torch.OutOfMemoryError when trying to load a model on multiple GPUs. I have 4 GPUs, each with 14.57 GiB memory, but the model fails to allocate memory on GPU 0, even though other GPUs should share the load.