how is multi gpu loss gathered?

kohya-ss / sd-scripts

Apache License 2.0

5.22k stars 867 forks source link

how is multi gpu loss gathered? #1558

Open SnapixAI opened 2 months ago

SnapixAI commented 2 months ago

I've been looking into the sd3 train branch, im trying to understand how are the loss gathered for multi-gpu and would love to understand the logic behind it. I'm used to working with accelerator.gather/reduce for loss/tensor updates. however im not seeing any of that being used in the sd3 training script which got me curious - how are the losses gathered across all processes

BootsofLagrangian commented 2 months ago

Accelerate does this thing. In accelerator.accumulate context manager, accelerate synchronize gradients and loss via sync_gradients.

sd-scripts utilizes accelerate from Hugging Face, it is very helpful to do high-level distributed learning.