Confusion about accelerator.num_processes in get_scheduler

huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

https://huggingface.co/docs/diffusers

Apache License 2.0

25.49k stars 5.28k forks source link

Confusion about accelerator.num_processes in get_scheduler #9633

Open hj13-mtlab opened 1 day ago

hj13-mtlab commented 1 day ago

In the example code from train_text_to_image_sdxl.py:

num_warmup_steps = args.lr_warmup_steps * args.gradient_accumulation_steps

But in train_text_to_image.py:

num_warmup_steps_for_scheduler = args.lr_warmup_steps * accelerator.num_processes

Why is there such a difference in these two cases?

a-r-r-o-w commented 1 day ago

Pinging @sayakpaul for training scripts. I don't think args.lr_warmup_steps * args.gradient_accumulation_steps is correct because you are already doing lesser number of gradient updates when usihng accumulation, so increasing the time it takes to reach true/peak LR does not make sense. I think lr_warmup_steps * num_processes is correct so that each rank can get equal-ish number of learning steps going from low to true/peak LR.