Use logging module instead of print statements

mpvginde commented 1 month ago

Hi Everyone,

What is your opinion on using the logging module instead of print statements for communicating with the user? This might become important if the code were to be used in an operational setting down the line.

I guess for the training, some of the logging is done by wandb, but I guess it could be usefull for logging error messages when loading datasets etc.

Kind regards, Michiel

leifdenby commented 1 month ago

great question @mpvginde! I don't think we have any guidelines/principles on that yet. I personally like using loguru because it has an easy setup, is an extension of python logging and makes it very easy to log to different outputs. So if we do start using logging I would vote for that, and then we could generally log progress of the application with INFO (like read file from ..., wrote output to ...) and debugging information used during development with DEBUG loglevel. In general I would say that if you find logging useful then I would welcome a PR from you adding it. As long as we use loglevels then people can easily opt-out by setting the loglevel appropriately.

joeloskarsson commented 4 weeks ago

I think more control over logging could be useful. For my use cases the current setup with wandb is quite sufficient, but it does not hurt to have more control and loglevels. A few thoughts from my side:

As you mention most logging is done to wandb (in the future maybe extended to also other experiment-tracking platforms). I think that should really be the main way to log, rather than printing to stdout. However, some things are hard to log to wandb, e.g. the graph overviews that print when models are created.
Most of the printing right now is done by Lightning. So however you decide to control logging, the most important thing is that it plays nicely with Lightning. What we would want to avoid is a situation where most logging is handled robustly using e.g. loguru, but then Lightning just prints things to stdout without any control anyway. I think this should be manageable, but it's an important aspect when implementing this.
We should make sure that no logging causes performance hits, e.g. by forcing GPU syncs. But I guess most interesting logging would be during initialization and not in the training/validation loops.

mllam / neural-lam

Use logging module instead of print statements #33