Closed innat closed 3 days ago
what does it mean? , src.
Multiple GPUs, or “model parallelism”, can be utilized but only one GPU will be active at any given moment. This forces the GPU to wait for the previous GPU to send it the output. You should launch your script normally with Python instead of other tools like torchrun and accelerate launch.
You may also be interested in pipeline parallelism which utilizes all available GPUs at once, instead of only having one GPU active at a time. This approach is less flexbile though. For more details, refer to the Memory-efficient pipeline parallelism guide.
System Info
Information
Tasks
examples
folderReproduction
About I am trying to fine-tune llama on multiple GPU using
trl
library. While training, I noticed thatgpu:0
is actively computing, while other GPUs set idle despite their VRAM are consumed. I feel like this is an unexpected act, expecting all GPUs would be busy during training. I looked for this issue but fit for my case.Here is the relevant code.
Checking if torch get all the device, and it does.
Checking with accelerate (suspicious).
Defining model and set
device_map="auto"
.Defining training arguments. Set
gradient_checkpointing_kwargs={"use_reentrant": False}
as I read in other issue, it is required.Checking layout map of model's on different device. It looks ok, Layers are placed on multiple GPUs (0, 1, 2).
Start training.
GPU usages
Expected behavior
At this point, I'm not sure if all GPUs are working as expected or not.