pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.62k stars 1.99k forks source link

BUG: pymc adds a logging handler on import - this makes it difficult for applications to control the logging configuration #7123

Open BenWhetton opened 7 months ago

BenWhetton commented 7 months ago

Describe the issue:

As stated in the python documentation logging cookbook "patterns to avoid" section. Libraries should not add log handlers. It makes it difficult for users to control the library's logging.

Example 1: an application My logger initialisation code comes after pymc is imported (it's standard practice to put imports at the top of the file), so pymc sees that there are no handler and adds it's own. I cannot control this. The only recourse I have is to remove the handler that pymc has added after the fact.

Example 2: pytest. When running pytest, no logging handlers should be registered, because pytest registers its own logging handlers to capture logs, do live logging, etc. I have not found a way to force it to do this before pymc is imported, so pymc will always add it's logger first which causes logging spam during tests. It also overrides the log level for the pymc module! I configure the log level in pytests conftest.py file to prevent it from being spammy, pymc will subsequently override it.

Proposed solution: Remove the code which adds a log handler in pymc.__init__.py. Allow scripts and applications to configure loggers. This may be a convenience feature for users who don't know python well and don't know that the logger needs to be initialised, but it makes life harder for all other users.


Note: pymc also appears to display progress bars even if all logging is disabled, and it isn't obvious how to disable it (this is a more minor issue because I've only encountered it during training).

Reproduceable code example:

# This example corresponds to the "application" example above.
# It's based on example code from the pymc website.

# As shown by the fact that the only log handler registered is a NullLogger, 
# I don't want any logging, but a handler is added by pymc outside of my control 
# so log messages are emmitted.

import logging
import os

import numpy as np

import pymc as pm

log = logging.getLogger()
log.addHandler(logging.NullHandler())

basic_model = pm.Model()

# True parameter values
alpha, sigma = 1, 1
beta = [1, 2.5]

# Size of dataset
size = 100

# Predictor variable
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2

# Simulate outcome variable
Y = alpha + beta[0] * X1 + beta[1] * X2 + np.random.randn(size) * sigma

with basic_model:

    # Priors for unknown model parameters
    alpha = pm.Normal("alpha", mu=0, sigma=10)
    beta = pm.Normal("beta", mu=0, sigma=10, shape=2)
    sigma = pm.HalfNormal("sigma", sigma=1)

    # Expected value of outcome
    mu = alpha + beta[0] * X1 + beta[1] * X2

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal("Y_obs", mu=mu, sigma=sigma, observed=Y)

map_estimate = pm.find_MAP(model=basic_model)

with basic_model:
    # draw 500 posterior samples
    trace = pm.sample(500, return_inferencedata=False)

Error message:

There is no error, however, having only assigned a NullLogger, I would expect there to be no logging, but there is.

PyMC version information:

Ubuntu 20.04. Pymc version 5.4.1. Installed from the pypi repository (using poetry).

The code which causes this issue is still present on master.

Context for the issue:

Every user who wants to configure the logging of their application/script has to work out why pymc logs are duplicated and bypassing their configured logging handlers. For example, if I don't manually remove the handler that pymc added, my app logs this:

Sampling: [likelihood]
2024-01-30T15:14:20.760+00:00 (+0.03s) () INFO     Sampling: [likelihood]

The first message is outside my control, emitted by the pymc-added log handler, and the second is emitted by my app's logging handler.

To work around this, every app which uses pymc must contain the following code:

pymc_logger = logging.getLogger("pymc")
pymc_logger.handlers.clear()

Imagine how much boilerplate this would cause if every library did this.

welcome[bot] commented 7 months ago

Welcome Banner] :tada: Welcome to PyMC! :tada: We're really excited to have your input into the project! :sparkling_heart:
If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct.

twiecki commented 7 months ago

Thanks for raising this @BenWhetton! I have run into the same problem where I noticed I couldn't control logging, good to know why.

I don't follow completely the solution, what in code would we need to change?

michaelosthege commented 7 months ago

Thanks for pointing this out!

We should change all logging initialization in PyMC to just

log = logging.getLogger(__name__)
ricardoV94 commented 7 months ago

Regarding progress bars. For those sampling methods that have it, there's a progressbar kwarg that defaults to True. Setting to False should do the trick

jessegrabowski commented 7 months ago

all the progressbar kwargs could default to a global variable that could be changed after importing if desired?

bwengals commented 7 months ago

Have also run into this issue, would be good to get logging right

all the progressbar kwargs could default to a global variable that could be changed after importing if desired?

I think that's a great idea for these sort of situations. You don't want to have to change model code just because you're running things in a different environment. Maybe some sort of config?