Open SnapixAI opened 2 months ago
Accelerate does this thing. In accelerator.accumulate context manager, accelerate synchronize gradients and loss via sync_gradients.
sd-scripts utilizes accelerate from Hugging Face, it is very helpful to do high-level distributed learning.
I've been looking into the sd3 train branch, im trying to understand how are the loss gathered for multi-gpu and would love to understand the logic behind it. I'm used to working with accelerator.gather/reduce for loss/tensor updates. however im not seeing any of that being used in the sd3 training script which got me curious - how are the losses gathered across all processes