pydiverse / pydiverse.pipedag

A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
https://pydiversepipedag.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
15 stars 2 forks source link

Simplify logging inside tasks #151

Open windiana42 opened 5 months ago

windiana42 commented 5 months ago

In general it is possible to just use the logging or structlog libraries for emitting any logging messages inside tasks. It would be nice though that it is easy to get a logger inside a task that is already having a good name making it easy to spot messages emitted by a certain task. One idea would be to offer a get_task_logger() method which simply calls structlog.getLogger() but uses open context classes to figure out task name and stage. Ideally it is tested that this also works in the DaskEngine and multi-node execution mode. Here, some log capturing must be implemented. Even more ideally, if we execute a task twice (i.e. version=AUTO_VERSION), we can separate out the two runs and only display one of them for the default configuration.