lucidrains / gigagan-pytorch

Implementation of GigaGAN, new SOTA GAN out of Adobe. Culmination of nearly a decade of research into GANs
MIT License
1.79k stars 103 forks source link

Multi GPU with gradient accumulation #37

Open dprze opened 1 year ago

dprze commented 1 year ago

Hi! While training on multi GPU and using gradient accumulation steps > 1 there's no substantial speedup with relation to a single GPU (there is a speedup if the value is equal to 1). I found following threads on huggingface here and here that seem to provide a solution. I even ran a dummy test by just adding a proper argument to Accelerator, and actually the training was much faster (in your class I set the gradient accumulation steps to 1, but for Accelerator to 8, but I didn't make other changes to take into account this modification, so the results weren't particularly useful 😉). If you have time to check if this is interesting for you, I'd be grateful.

julien-blanchon commented 2 months ago

I'm also experimenting this behaviour with HF Accelerate on my custom code