optimizer.step_was_skipped not correct in accelerator.accumulate

Fadelis98 commented 1 month ago

System Info

- `Accelerate` version: 0.30.1
- Platform: Linux-5.4.0-144-generic-x86_64-with-glibc2.35
- `accelerate` bash location: /root/miniconda3/envs/qec/bin/accelerate
- Python version: 3.11.9
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.3.0 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- System RAM: 1007.53 GB
- GPU type: NVIDIA GeForce RTX 4090
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: MULTI_GPU
        - mixed_precision: no
        - use_cpu: False
        - debug: True
        - num_processes: 8
        - machine_rank: 0
        - num_machines: 1
        - gpu_ids: all
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - enable_cpu_affinity: True
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[X] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[X] My own task or dataset (give details below)

Reproduction

The value of optimizer.step_was_skipped, as it is named, should be True whenever optimizer.step() is called but the step is not actually applied to the parameters. The logic is implemented at here, which is inside the if self.gradient_state.sync_gradients condition. The standard implemention of gradient accumulation controls whather to actually step the optimizer by this condition, so optimizer.step_was_skipped would be always False

Expected behavior

If this is an expected behaviour, rename optimizer.step_was_skipped or note this behaviour in the doc string. Otherwise, fix its logic to return True when the gradient is accumulated.

muellerzr commented 1 month ago

step_was_skipped is only done if we hit overflow, not if we are accumulating gradients. We can rename this to be overflow_hit potentially, but that is the true intention of it, not if we are not syncing the gradients yet

github-actions[bot] commented 1 hour ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / accelerate