Learning Rate Scheduler Stepping too fast on MultiGPU

priyammaz commented 1 month ago

System Info

- `Accelerate` version: 0.22.0
- Platform: Linux-4.18.0-477.55.1.el8_8.x86_64-x86_64-with-glibc2.28
- Python version: 3.11.4
- Numpy version: 1.24.4
- PyTorch version (GPU?): 2.0.1 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 251.62 GB
- GPU type: NVIDIA A40
- `Accelerate` default config:
    - compute_environment: LOCAL_MACHINE
    - distributed_type: MULTI_GPU
    - mixed_precision: no
    - use_cpu: False
    - debug: False
    - num_processes: 4
    - machine_rank: 0
    - num_machines: 1
    - gpu_ids: 0,1,2,3
    - rdzv_backend: static
    - same_network: True
    - main_training_function: main
    - downcast_bf16: no
    - tpu_use_cluster: False
    - tpu_use_sudo: False
    - tpu_env: []

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[ ] My own task or dataset (give details below)

Reproduction

I am experimenting with different schedulers and noticed a small problem Here is the skeleton of the training script, nothing fancy:


model = ...
optimizer = ...

main_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.1)
warmup_scheduler = torch.optim.lr_scheduler.LinearLR(optimizer, total_iters=5)
scheduler = torch.optim.lr_scheduler.SequentialLR(optimizer, schedulers=[warmup_scheduler, main_scheduler], milestones=[5])

model, optimizer, trainloader, testloader, scheduler = accelerator.prepare(model, optimizer, trainloader, testloader, scheduler)

for epoch in range(EPOCHS):

    ### Train Loop ###
    for images, labels in trainloader:
        out = model(image)
        loss = loss_fn(out, labels)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

    ### Validation Loop ###
    for images, labels in testloader:
        with torch.no_grad():
            out = model(images)

     ### Iterate Scheduler ###
     scheduler.step()

Expected behavior

What i want is basically, over the 100 epochs i will train the model, the first 4 epochs should be a warmup and then ever 20 epochs after that the learning rate will reduce by a factor of 0.1. This works totally fine on a single GPU, but then when doing two GPUs, it goes through the scheduler twice as fast as if the scheduler.step() is being called twice. Should i wrap the scheduler.step() to only occur on the main gpu using if accelerator.is_local_main_process(), or multiple everythign by the number of GPUs, or is there a better way to do this that I am missing?

priyammaz commented 1 month ago

Screen Shot 2024-07-09 at 12 55 52 PM

I have been logging the learning rate on wandb and it looks like this (training for 90 epochs and multiplying learning rate by 0.1 every 30 epochs). But as you can see, I was training this model on 2 GPUs, so the scheduler is multipling the learnign rate by 0.1 every 15 epochs instead (so going twice as fast)

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / accelerate