xrsrke / pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
MIT License
76 stars 17 forks source link

Distributed Logger #33

Open xrsrke opened 9 months ago

xrsrke commented 9 months ago

Print log messages based on a specific rank or ParallelMode neatly to the terminal, and save them to a local file. Let the user configure the file path and file name. By default, save the log using the name passed in by the user.

APIs

from pipegoose.distributed import ParallelMode
from pipegoose.distributed.logger import DistributedLogger

logger = DistributedLogger("latency_logger", parallel_context)

logger.info("hello", parallel_mode=ParallelMode.GLOBAL)
logger.warning("hello", parallel_mode=ParallelMode.GLOBAL)
logger.debug("hello", parallel_mode=ParallelMode.GLOBAL)
logger.error("hello", parallel_mode=ParallelMode.GLOBAL)

# other arguments
logger.info("hello", rank=0, parallel_mode=ParallelMode.GLOBAL)
logger.info("hello", rank=0, parallel_mode=ParallelMode.TENSOR)

TODO

KevorkSulahian commented 9 months ago

Made a PR on #35 @xrsrke