Closed black-snow closed 2 months ago
Hi @black-snow, I suspect that there is something you've missed, missing features, but also some few for lazy conversion.
As you've correctly identified (and as recommended by the docs and various linters) it is recommended to use lazy interpolation when creating log messages.
This works by storing the message and the variables to be interpolated within the LogRecord
instance. When a Formatter
needs the real log message it calls record.getMessage
which does the string interpolation.
The reason this is lazy is that the Formatter
is only called by a Handler
if it is enabled for the given log level and logger name.
This deferred interpolation only applies to the message and does not affect any other data that is generated and attached to the LogRecord
as this needs to be done before the record is passed to the various loggers or handlers.
In short, all generation of JSON is lazy as it is generated in the same place as log record messages - in a Formatter
(which is only called by the Handler
if it enabled for the logger and log level).
For str
messages, the same lazy interpolation still works as normal as the JSON formatters will use record.getMessage
to generate the message field.
dict
messagesThings change when using a dict
for a message. In this case generation of the JSON will still be lazy based on the log level, but the dictionary itself will be created before the actual log call.
This isn't particularly different to the lazy interpolation of the message in that all values to be interpolated must be known before making the logging call.
Consider:
logger.debug("some message %s", expensive_to_compute())
This also applies to the extra={...}
argument which is stored on the log record at time of creation.
In these situations your best bet is to a logger + isEnabledFor
:
logger = getLogger() # get root logger since examples are using root logger, but can be done on specific loggers
if logger.isEnabledFor("DEBUG"):
logger.debug("some message %s", expensive_to_compute())
logger.debug({"some_field": also_expensive()})
logger.debug("another message", extra={"another_field": somehow_also_expensive()})
dict
messages and string interpolationIf your specific use case is something like:
logging.debug({
"a_field": "this is a message %s" % expensive_to_compute(),
"b_field": "this is another message %s" % also_expensive()
})
Then there currently is no way to do lazy interpolation. It could be implemented but would require parsing the values of fields in the message dict (or extra
argument), which I think is a pretty rare case.
In these situations I would recommend using isEnabledfor
to protect known expensive to compute log messages.
Thanks for the quick & extensive answer. We are a bit off track, I fear.
logger.debug("whatevs %s" % whatevs)
ain't lazy. Python will assemble the string and pass it to debug
. logger.debug("whatevs %s", whatevs)
is.
logger.debug("whatevs %s", expensive_to_compute())
ain't lazy. logger.debug("whatevs %s", expensive_to_compute)
would be if debug
called callables (haven't checked - this is how you do it w/ decent libs in Java, for example).
To make this work for values in the dict we could accept str | tuple[str | callable]
where the first item is the message / template.
logger.debug({"key1": ("string with a %s", var), "key2": ("%s %s %s", perhaps, many, vars)})
Which gets funny with deeper nesting but it could, in theory, be applied recusively. If you ever need it.
/edit: or rather accept Any
for we'll want to str()
whatever we're handed that's not a callable. You cannot really hint "the first element should be str
and the 1+n following ones should be callable -> Any
" if I'm not mistaken.
logger.debug("whatevs %s", expensive_to_compute())
ain't lazy.logger.debug("whatevs %s", expensive_to_compute)
would be if debug called callables (haven't checked - this is how you do it w/ decent libs in Java, for example).
Right, it sounds like what you're describing as lazy here is not the same as what the python logging documentation refers to when discussing lazy / deferring the interpolation.
It's not wrong, that's just not what I understood from your original question.
So if I understand correctly you want to defer string interpolation and the generation of values used in that interpolation.
If that's the case I am loathe to add anything that makes implicit assumptions about the data being passed. For example how do I know you want to lazy interpolate and aren't just passing tuple[str, ...]
? See also: policy on breaking changes.
That said I think there might be a solution creating a class that we can then add to the various JSON encoder default methods. Like using a dictionary for the message this would only work with our JSON formatters.
class LazyStr:
__init__(self, template, *args) -> None:
self.template = template
self.args = args
__str__(self) -> str:
values = [item() for item in self.args if callable(item) else item]
return self.template % tuple(values)
__call__(self) -> str:
# Could be useful so that LazyStr can contain other LazyStr in it's args
return self.__str__()
def json_default(o: Any) -> Any:
# Do we even need this if we're making use of __str__? the json_default functions will end up calling str(o) anyway...
# https://github.com/nhairs/python-json-logger/blob/main/src/pythonjsonlogger/defaults.py#L37
if isinstance(o, LazyStr):
return str(o)
# ...
logger.debug({
"key1": LazyStr("string with a %s", var),
"key2": LazyStr("%s %s %s", perhaps, many, vars),
})
Per my embedded comments: given that the JSON default functions will end up str(lazy_str_obj)
one could argue that this could/should be implemented as an external package as not specific to JSON logging and others may benefit from it. I'll note that there are some /similar/ packages already (lazy-str
, stringlike
) but these do not not support templates directly.
For reference the LazyStr
scaffold above would work with the existing LogRecord
objects (and thus formatters etc) as it also calls str()
# https://github.com/python/cpython/blob/main/Lib/logging/__init__.py#L391
def getMessage(self):
"""
Return the message for this LogRecord.
Return the message for this LogRecord after merging any user-supplied
arguments with the message.
"""
msg = str(self.msg)
if self.args:
msg = msg % self.args
return msg
Thanks again for your time and effort.
What would you expect to happen if I passed tuple[str, ...]
as a value? There's no JSON representation for tuples, right? (Well, arguably lists - if one argues that you cannot "change" JSON anyway.)
I don't know what I'd expect to happen here: logger.debug({"key": ("a", "b", "c")})
... a list and a warning? A list if I configured it to turn tuples into lists and an error otherwise? ...
But then again, lists are rather supposed to be homogeneous as opposed to tuples - so perhaps ["a", "b", "c"]
is the obvious answer.
LazyStr
doesn't look too bad, though. It's seven characters extra but you will not have it everywhere plus if you knew about the special tuple evaluation knowing LazyStr was no different.
The more I read it the more I like it.
There's no JSON representation for tuples, right?
All Python JSON libraries I've used will produce a list by default. I don't think there is a reason for them to produce anything different.
I don't know what I'd expect to happen here: logger.debug({"key": ("a", "b", "c")}) ... a list and a warning? A list if I configured it to turn tuples into lists and an error otherwise?
You should expect {"key": ["a", "b", "c"]}
.
It's seven characters extra but you will not have it everywhere plus if you knew about the special tuple evaluation knowing LazyStr was no different.
It's either that or use if logger.isEnabledFor(...):
With other packages and existing solutions, it's starting to feel a lot like something to put into the documentation rather than add to the library
With an extra if statement in "my" code it feels like a detail of the logging is polluting it.
I could also just warp the logger or patch something onto it.
What would you put into the docs then?
With an extra if statement in "my" code it feels like a detail of the logging is polluting it.
:shrug: I don't know what to tell you, using an if statement or using lazy strings are both going to pollute your code when avoiding expensive log statements.
Given there are straight forward workarounds I don't intend to add support for implicitly doing lazy interpolation.
I could also just warp the logger or patch something onto it.
If you do want to add your own magic then you should look into modifying the logged data.
What would you put into the docs then?
I'll probably add something to the cookbook like so:
import logging
import time
from pythonjsonlogger.json import JsonFormatter
def expensive_to_compute():
time.sleep(5)
return "world"
## Setup
## -------------------------------------
logger = logging.getLogger()
handler = logging.StreamHandler()
formatter = JsonFormatter()
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.INFO)
## Log Using isEnabledFor
## -------------------------------------
start = time.time()
if logger.isEnabledFor(logging.INFO):
logger.info(
{
"data": "hello {}".format(expensive_to_compute())
}
)
print(f"Logging INFO using isEnabledFor took: {int(time.time() - start)}s")
start = time.time()
if logger.isEnabledFor(logging.DEBUG):
logger.debug(
{
"data": "hello {}".format(expensive_to_compute())
}
)
print(f"Logging DEBUG using isEnabledFor took: {int(time.time() - start)}s")
## Log Using LazyString
## -------------------------------------
# pip install lazy-string
from lazy_string import LazyString as L
start = time.time()
logger.info(
{
"data": L("hello {}".format, L(expensive_to_compute))
}
)
print(f"Logging INFO using LazyString took: {int(time.time() - start)}s")
start = time.time()
logger.debug(
{
"data": L("hello {}".format, L(expensive_to_compute))
}
)
print(f"Logging DEBUG using LazyString took: {int(time.time() - start)}s")
Alright. Thank you very much @nhairs. I'll leave it open for the docs to land.
With classic logging you'd:
so that a perhaps costly conversion or interpolation isn't payed if the log level isn't debug or trace. How would we do the same with structured logging?
could work for the message but what if I need it inside another field?
This looks nasty.
Or is there already something in place I've missed?