Open rabernat opened 3 years ago
In general, libraries like pangeo-forge should only provide the logger
objects and the logging statements. It's typically up to applications using the library to configure logging for their needs.
So I imagine that anyone executing the recipes would have something like
logger = logging.getLogger("pangeo_forge.recipe") # or whatever logger to target. Maybe just "pangeo_forge")
# configure the handler, formatter
handler = logging.StreamHandler() # print to stdout.
handler.setLevel(logging.INFO)
logger.addHandler(handler)
logger.setLevel(logging.INFO)
That's configured things to print to stdout for messages with INFO or higher (so not DEBUG). I think pangeo-forge might provide a standard handler and formatter that executors could use.
Side-note, libraries should typically use logger.debug("thing %s, ds")
rather than logger.debug(f"thing {ds}")
in case calling str(ds)
is "expensive". With the %s
-style then str(ds)
isn't called unless there's a handler that's actually listening at DEBUG
level.
Configuring the Dask workers is a bit more involved. I'll have to look up some examples. But it follows a similar pattern of adding a handler. The extra complication is that the handlers are on a different machine. We can pull the logs with client.get_worker_logs
(I think) or have them pushed somewhere central with a more complicated handler.
Thanks Tom! I found I also needed to follow this advice to get the logs to show in jupyterlab.
I have no idea!
If we execute using prefect, it should be straightforward to get the logs: https://docs.prefect.io/core/concepts/logging.html#extra-loggers
I am really stuck on something and I would love some help from a Dask guru (@martindurant or @TomAugspurger).
I am debugging a problem with distributed locking (see #86) and I really need to see the pangeo-forge logging from dask workers during pytest execution. I can turn on logging from pytest by running e.g.
py.test --log-cli-level=DEBUG -s tests/test_recipes.py::test_chunks[manual-netCDFtoZarr_sequential_recipe-target_chunks0-True-chunk_expectation0-2-D]
However, when I am executing with dask, e.g.
py.test --log-cli-level=DEBUG -s tests/test_recipes.py::test_chunks[dask-netCDFtoZarr_sequential_recipe-target_chunks0-True-chunk_expectation0-2-D]
there is no output because the logging happens on the workers. I have tried all sorts of hacks to try to get the output to show, but I am stuck.
I CAN see STDOUT from the workers.
Any hints would be appreciated.
Can you add a handler to redirect dask worker logs to stdout? With something like the following somewhere (at the top of the test_recipes.py
file maybe)
import logging
logger = logging.getLogger("distributed.worker")
handler = logging.StreamHandler()
handler.setLevel(logging.DEBUG)
logger.setLevel(logging.DEBUG)
logger.addHandler(handler)
To be clear, I am not looking for the distributed.worker
logs. Those already print. I am looking for the pangeo_forge.recipe
logs.
That was one of the things I tried (with "pangeo_forge"
instead of "distributed.worker"
), and it did not work.
However, it gave me an idea. I added this to the fixture that creates the dask cluster
def redirect_logs():
import logging
logger = logging.getLogger("pangeo_forge")
handler = logging.StreamHandler()
handler.setLevel(logging.DEBUG)
logger.setLevel(logging.DEBUG)
logger.addHandler(handler)
client.run(redirect_logs)
and it worked! 🎉
Gotcha, sorry I missed that point.
I think that the bakeries will want some kind of logging config file and then we'd set the Dask config to load it (so that it's picked up everywhere). It's a bit finnicky to get right, but hopefully we just have to do it once (per bakery).
I put a lot of logging statements into the recipe class to help with debugging. But it is never easy to turn them on. How can we make pangeo-forge work like prefect, where logs are automatically on by default? In particular, I would really like to see the logs on the dask workers.