nhairs / python-json-logger

JSON Formatter for Python Logging
https://nhairs.github.io/python-json-logger/
BSD 2-Clause "Simplified" License
28 stars 5 forks source link

How to lazy-log extras / lazy-evaluate interpolations #21

Closed black-snow closed 2 months ago

black-snow commented 2 months ago

With classic logging you'd:

logger.debug("some var: %s", the_var)

so that a perhaps costly conversion or interpolation isn't payed if the log level isn't debug or trace. How would we do the same with structured logging?

logger.debug("some var: %s", the_var, extras={})

could work for the message but what if I need it inside another field?

logger.debug({"key1": "%s", "key2": "%s"}, var1, var2)

This looks nasty.

Or is there already something in place I've missed?

nhairs commented 2 months ago

Hi @black-snow, I suspect that there is something you've missed, missing features, but also some few for lazy conversion.

Lazy message interpolation

As you've correctly identified (and as recommended by the docs and various linters) it is recommended to use lazy interpolation when creating log messages.

This works by storing the message and the variables to be interpolated within the LogRecord instance. When a Formatter needs the real log message it calls record.getMessage which does the string interpolation.

The reason this is lazy is that the Formatter is only called by a Handler if it is enabled for the given log level and logger name.

This deferred interpolation only applies to the message and does not affect any other data that is generated and attached to the LogRecord as this needs to be done before the record is passed to the various loggers or handlers.

Generation of JSON

In short, all generation of JSON is lazy as it is generated in the same place as log record messages - in a Formatter(which is only called by the Handler if it enabled for the logger and log level).

For str messages, the same lazy interpolation still works as normal as the JSON formatters will use record.getMessage to generate the message field.

dict messages

Things change when using a dict for a message. In this case generation of the JSON will still be lazy based on the log level, but the dictionary itself will be created before the actual log call.

This isn't particularly different to the lazy interpolation of the message in that all values to be interpolated must be known before making the logging call.

Consider:

logger.debug("some message %s", expensive_to_compute())

This also applies to the extra={...} argument which is stored on the log record at time of creation.

In these situations your best bet is to a logger + isEnabledFor:

logger = getLogger()  # get root logger since examples are using root logger, but can be done on specific loggers

if logger.isEnabledFor("DEBUG"):
    logger.debug("some message %s", expensive_to_compute())
    logger.debug({"some_field": also_expensive()})
    logger.debug("another message", extra={"another_field": somehow_also_expensive()})

dict messages and string interpolation

If your specific use case is something like:

logging.debug({
    "a_field": "this is a message %s" % expensive_to_compute(),
    "b_field": "this is another message %s" % also_expensive()
})

Then there currently is no way to do lazy interpolation. It could be implemented but would require parsing the values of fields in the message dict (or extra argument), which I think is a pretty rare case.

In these situations I would recommend using isEnabledfor to protect known expensive to compute log messages.

black-snow commented 2 months ago

Thanks for the quick & extensive answer. We are a bit off track, I fear.

logger.debug("whatevs %s" % whatevs) ain't lazy. Python will assemble the string and pass it to debug. logger.debug("whatevs %s", whatevs) is.

logger.debug("whatevs %s", expensive_to_compute()) ain't lazy. logger.debug("whatevs %s", expensive_to_compute) would be if debug called callables (haven't checked - this is how you do it w/ decent libs in Java, for example).

To make this work for values in the dict we could accept str | tuple[str | callable] where the first item is the message / template.

logger.debug({"key1": ("string with a %s", var), "key2": ("%s %s %s", perhaps, many, vars)})

Which gets funny with deeper nesting but it could, in theory, be applied recusively. If you ever need it.

/edit: or rather accept Any for we'll want to str() whatever we're handed that's not a callable. You cannot really hint "the first element should be str and the 1+n following ones should be callable -> Any" if I'm not mistaken.

nhairs commented 2 months ago

logger.debug("whatevs %s", expensive_to_compute()) ain't lazy. logger.debug("whatevs %s", expensive_to_compute) would be if debug called callables (haven't checked - this is how you do it w/ decent libs in Java, for example).

Right, it sounds like what you're describing as lazy here is not the same as what the python logging documentation refers to when discussing lazy / deferring the interpolation.

It's not wrong, that's just not what I understood from your original question.

So if I understand correctly you want to defer string interpolation and the generation of values used in that interpolation.

If that's the case I am loathe to add anything that makes implicit assumptions about the data being passed. For example how do I know you want to lazy interpolate and aren't just passing tuple[str, ...]? See also: policy on breaking changes.

That said I think there might be a solution creating a class that we can then add to the various JSON encoder default methods. Like using a dictionary for the message this would only work with our JSON formatters.

class LazyStr:
    __init__(self, template, *args) -> None:
        self.template = template
        self.args = args

    __str__(self) -> str:
        values = [item() for item in self.args if callable(item) else item]
        return self.template % tuple(values)

    __call__(self) -> str:
        # Could be useful so that LazyStr can contain other LazyStr in it's args
        return self.__str__()

def json_default(o: Any) -> Any:
    # Do we even need this if we're making use of __str__? the json_default functions will end up calling str(o) anyway...
    # https://github.com/nhairs/python-json-logger/blob/main/src/pythonjsonlogger/defaults.py#L37
    if isinstance(o, LazyStr):
        return str(o)
    # ...

logger.debug({
    "key1": LazyStr("string with a %s", var),
    "key2": LazyStr("%s %s %s", perhaps, many, vars),
})

Per my embedded comments: given that the JSON default functions will end up str(lazy_str_obj) one could argue that this could/should be implemented as an external package as not specific to JSON logging and others may benefit from it. I'll note that there are some /similar/ packages already (lazy-str, stringlike) but these do not not support templates directly.

^ I am open to other opinions on this

For reference the LazyStr scaffold above would work with the existing LogRecord objects (and thus formatters etc) as it also calls str()

# https://github.com/python/cpython/blob/main/Lib/logging/__init__.py#L391
    def getMessage(self):
        """
        Return the message for this LogRecord.

        Return the message for this LogRecord after merging any user-supplied
        arguments with the message.
        """
        msg = str(self.msg)
        if self.args:
            msg = msg % self.args
        return msg
black-snow commented 2 months ago

Thanks again for your time and effort.

What would you expect to happen if I passed tuple[str, ...] as a value? There's no JSON representation for tuples, right? (Well, arguably lists - if one argues that you cannot "change" JSON anyway.)
I don't know what I'd expect to happen here: logger.debug({"key": ("a", "b", "c")}) ... a list and a warning? A list if I configured it to turn tuples into lists and an error otherwise? ...
But then again, lists are rather supposed to be homogeneous as opposed to tuples - so perhaps ["a", "b", "c"] is the obvious answer.

LazyStr doesn't look too bad, though. It's seven characters extra but you will not have it everywhere plus if you knew about the special tuple evaluation knowing LazyStr was no different.
The more I read it the more I like it.

nhairs commented 2 months ago

There's no JSON representation for tuples, right?

All Python JSON libraries I've used will produce a list by default. I don't think there is a reason for them to produce anything different.

I don't know what I'd expect to happen here: logger.debug({"key": ("a", "b", "c")}) ... a list and a warning? A list if I configured it to turn tuples into lists and an error otherwise?

You should expect {"key": ["a", "b", "c"]}.

It's seven characters extra but you will not have it everywhere plus if you knew about the special tuple evaluation knowing LazyStr was no different.

It's either that or use if logger.isEnabledFor(...):


With other packages and existing solutions, it's starting to feel a lot like something to put into the documentation rather than add to the library

black-snow commented 2 months ago

With an extra if statement in "my" code it feels like a detail of the logging is polluting it.

I could also just warp the logger or patch something onto it.

What would you put into the docs then?

nhairs commented 2 months ago

With an extra if statement in "my" code it feels like a detail of the logging is polluting it.

:shrug: I don't know what to tell you, using an if statement or using lazy strings are both going to pollute your code when avoiding expensive log statements.

Given there are straight forward workarounds I don't intend to add support for implicitly doing lazy interpolation.

I could also just warp the logger or patch something onto it.

If you do want to add your own magic then you should look into modifying the logged data.

What would you put into the docs then?

I'll probably add something to the cookbook like so:

import logging
import time

from pythonjsonlogger.json import JsonFormatter

def expensive_to_compute():
    time.sleep(5)
    return "world"

## Setup
## -------------------------------------
logger = logging.getLogger()
handler = logging.StreamHandler()
formatter = JsonFormatter()
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.INFO)

## Log Using isEnabledFor
## -------------------------------------
start = time.time()
if logger.isEnabledFor(logging.INFO):
    logger.info(
        {
            "data": "hello {}".format(expensive_to_compute())
        }
    )
print(f"Logging INFO using isEnabledFor took: {int(time.time() - start)}s")

start = time.time()
if logger.isEnabledFor(logging.DEBUG):
    logger.debug(
        {
            "data": "hello {}".format(expensive_to_compute())
        }
    )
print(f"Logging DEBUG using isEnabledFor took: {int(time.time() - start)}s")

## Log Using LazyString
## -------------------------------------
# pip install lazy-string
from lazy_string import LazyString as L

start = time.time()
logger.info(
    {
        "data": L("hello {}".format, L(expensive_to_compute))
    }
)
print(f"Logging INFO using LazyString took: {int(time.time() - start)}s")

start = time.time()
logger.debug(
    {
        "data": L("hello {}".format, L(expensive_to_compute))
    }
)
print(f"Logging DEBUG using LazyString took: {int(time.time() - start)}s")
black-snow commented 2 months ago

Alright. Thank you very much @nhairs. I'll leave it open for the docs to land.