huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.32k stars 872 forks source link

accelerator.prepare just can be run jus once ? #2882

Open DavideHe opened 1 week ago

DavideHe commented 1 week ago

System Info

- `Accelerate` version: 0.28.0
- Platform: Linux-5.4.250-2-velinux1u1-amd64-x86_64-with-glibc2.29
- Python version: 3.8.10
- Numpy version: 1.21.0
- PyTorch version (GPU?): 2.2.0+cu118 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 2015.16 GB
- GPU type: NVIDIA A100-SXM4-80GB
- `Accelerate` default config:
    Not found

Information

Tasks

Reproduction

I write the code with accelerator.prepare more than once:

model, optimizer, train_dataloader,eval_dataloader= accelerator.prepare(
            model, optimizer, train_dataloader,eval_dataloader)
lr_scheduler= accelerator.prepare( lr_scheduler)

lr_scheduler.step() running is different with prepare once . with once, with accelerator.accumulate(model): ,the lr_scheduler.step() will run num_processes times every step ,see code. with twice prepare and lr_scheduler prepare after.with accelerator.accumulate(model): ,the lr_scheduler.step() will run once every step.

with 2nd prepare with lr_scheduler,Is there some difference with with accelerator.accumulate(model):?

Expected behavior

The reasons for differences in coding type.

SunMarc commented 1 week ago

Hi @DavideHe, thanks for raising the issue. could you share a minimal reproducer ? The lr_scheduler should behave the same when if the lr_scheduler is in the 2nd prepare. However, we expect the user to only use prepare once. What is the behavior you were expecting ? With accelerator.accumulate(model), the lr_scheduler is should be updated after every gradient_accumulation_steps iteration. See related issue https://github.com/huggingface/accelerate/issues/963

DavideHe commented 5 days ago

prepare twice

model, optimizer, train_dataloader,eval_dataloader= accelerator.prepare(
            model, optimizer, train_dataloader,eval_dataloader)
lr_scheduler= accelerator.prepare( lr_scheduler)
for data in train_dataloader:
    with accelerator.accumulate(model):
        lr_scheduler.step()
        print(lr_scheduler.get_last_lr()[-1])

as the code above, the lr will update every step when gradient_accumulation_steps > 1. But prepare once , lr will update every gradient_accumulation_steps step.