huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.34k stars 875 forks source link

Accelerate logging in_order=True does not work properly #2827

Open zhcm opened 1 month ago

zhcm commented 1 month ago

System Info

- `Accelerate` version: 0.30.1
- Platform: Linux-5.4.0-132-generic-x86_64-with-glibc2.17
- `accelerate` bash location: /miniconda3/envs/test/bin/accelerate
- Python version: 3.8.18
- Numpy version: 1.24.3
- PyTorch version (GPU?): 1.13.1 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- System RAM: 330.26 GB
- GPU type: NVIDIA GeForce RTX 3090
- `Accelerate` default config:
    Not found

Information

Tasks

Reproduction

from accelerate.logging import get_logger
from accelerate import Accelerator

logger = get_logger(__name__)

accelerator = Accelerator()
logger.info("My log", main_process_only=False)
logger.debug("My log", main_process_only=True)

logger = get_logger(__name__, log_level="DEBUG")
logger.info("My log")
logger.debug("My second log")

array = ["a", "b", "c", "d"]
letter_at_rank = array[accelerator.process_index]
logger.info(letter_at_rank, in_order=True)

Expected behavior

logging My log and other

BenjaminBossan commented 1 month ago

The issue is that there is no handler configured for the logger. After adding it, the outputs are shown:

import logging
from accelerate.logging import get_logger
from accelerate import Accelerator

accelerator = Accelerator()
logger = get_logger(__name__, log_level="DEBUG")
handler = logging.StreamHandler()
logger.logger.addHandler(handler)
logger.info("My log")
logger.debug("My second log")

array = ["a", "b", "c", "d"]
letter_at_rank = array[accelerator.process_index]
logger.info(letter_at_rank, in_order=True)

The reason why we don't need to add a handler when using logger.warning or higher is that Python has a last resort handler for this log levels.

zhcm commented 4 weeks ago

The issue is that there is no handler configured for the logger. After adding it, the outputs are shown:

import logging
from accelerate.logging import get_logger
from accelerate import Accelerator

accelerator = Accelerator()
logger = get_logger(__name__, log_level="DEBUG")
handler = logging.StreamHandler()
logger.logger.addHandler(handler)
logger.info("My log")
logger.debug("My second log")

array = ["a", "b", "c", "d"]
letter_at_rank = array[accelerator.process_index]
logger.info(letter_at_rank, in_order=True)

The reason why we don't need to add a handler when using logger.warning or higher is that Python has a last resort handler for this log levels.

thank you, i got the outputs,but it's out-of-order, even i set in_order=True

BenjaminBossan commented 4 weeks ago

What exactly did you run to get this error? What is your setup?

zhcm commented 4 weeks ago

What exactly did you run to get this error? What is your setup?

i use the example in https://huggingface.co/docs/accelerate/v0.30.1/en/package_reference/logging, and add streamhandler,

    accelerator = Accelerator()

    logger = get_logger(__name__, log_level="DEBUG")
    handler = logging.StreamHandler()
    logger.logger.addHandler(handler)
    logger.info("My log")
    logger.debug("My second log")

    accelerator.wait_for_everyone()

    array = ["a", "b", "c", "d"]
    letter_at_rank = array[accelerator.process_index]
    logger.info(letter_at_rank, main_process_only=False, in_order=True)

the output is

My log
My second log
a
d
c
b
BenjaminBossan commented 4 weeks ago

Indeed, I think in_order=True does not do what it is supposed to do or I misunderstand its meaning:

$ accelerate launch accelerate-2827.py 
in order True
hello from rank 0
hello from rank 1
in order False
hello again from rank 0
hello again from rank 1
$ accelerate launch accelerate-2827.py 
hello from rank 1
in order True
hello from rank 0
in order False
hello again from rank 0
hello again from rank 1
$ accelerate launch accelerate-2827.py 
in order True
hello from rank 0
hello from rank 1
in order False
hello again from rank 1
hello again from rank 0
$ accelerate launch accelerate-2827.py 
in order True
hello from rank 0
hello from rank 1
in order False
hello again from rank 0
hello again from rank 1
$ accelerate launch accelerate-2827.py 
hello from rank 1
in order True
hello from rank 0
in order False
hello again from rank 1
hello again from rank 0
$ accelerate launch accelerate-2827.py 
hello from rank 1
in order True
hello from rank 0
in order False
hello again from rank 1
hello again from rank 0

using

import logging
from accelerate.logging import get_logger
from accelerate import Accelerator

accelerator = Accelerator()
logger = get_logger(__name__, log_level="DEBUG")
handler = logging.StreamHandler()
logger.logger.addHandler(handler)
accelerator.print("in order True")
logger.info(f"hello from rank {accelerator.process_index}", in_order=True, main_process_only=False)
accelerator.wait_for_everyone()
accelerator.print("in order False")
logger.info(f"hello again from rank {accelerator.process_index}", in_order=False, main_process_only=False)
muellerzr commented 4 weeks ago

Yes, it is not in this case, as it should go fully from 0->n, we do so in a for loop, so it's interesting this fails.

zhcm commented 3 weeks ago

What exactly did you run to get this error? What is your setup?

any update?