hynek / structlog

Simple, powerful, and fast logging for Python.
https://www.structlog.org/
Other
3.52k stars 221 forks source link

`ExceptionDictTransformer` incorrectly formatting strings #638

Open wochinge opened 3 months ago

wochinge commented 3 months ago

Version 24.4.0

Code to reproduce

def test_structlog():
    structlog.configure(processors=[ExceptionRenderer(ExceptionDictTransformer(show_locals=True)), JSONRenderer()])
    logger = structlog.get_logger(__name__)
    try:
        my_variable = "number"
        raise ValueError()
    except ValueError as e:
        logger.exception("An error occurred")

Problem The local string variable my_variable is escaped as "'number'" where it should be just `"number"

{"event": "An error occurred", "exception": [{"exc_type": "ValueError", "exc_value": "", "syntax_error": null, "is_cause": false, "frames": [{"filename": "...", "lineno": 11, "name": "test_structlog", "locals": {"logger": "<BoundLoggerLazyProxy(logger=None, wrapper_class=None, processors=None, context_class=None, initial_values={}, logger_factory_args=('test_tobi',))>", "my_variable": "'number'", "e": "ValueError()"}}]}]}

Root Cause

This is caused by this line. Instead of repr(object) it should be just object for strings

hynek commented 3 months ago

@sscherfke πŸ˜‡

sscherfke commented 3 months ago

This is an expected change. repr() is now used for every object, including strings. This is needed so that Pydantic Secret Types and Typed Settings SecretStr do not leak.

For example:

>>> import json
>>> from typed_settings.types import SecretStr
>>>
>>> json.dumps({"a": repr("spam"), "b": repr(SecretStr("eggs"))})
'{"a": "\'spam\'", "b": "\'*******\'"}'
>>>
>>> json.dumps({"a": "spam", "b": SecretStr("eggs")})
'{"a": "spam", "b": "eggs"}'

I’m sure we could find a way to get sth. like {"a": "spam", "b": "*******"} but this would unnecessarily(?) bloat the code.

sscherfke commented 3 months ago

Okay, Pydantic is not affected – their Secretstr does not inherit str. Either they changed behavior in v2 or I didn't remember it correctly.

sscherfke commented 3 months ago

Rich's "to_repr" implementation does the same thing and when printing "'spam'" it becomes just 'spam' which is the (new) expected result.

So imho, this issue should be closed as wontfix/works-as-expected/... :)

wochinge commented 3 months ago

Thanks for your swift responses πŸ™ŒπŸ»

So if I understand that correctly, you suggest this is not an issue because the Renderer is taking care of representing the data correctly?

In my above example code, the JSONRenderer doesn't do that which results in the extra single quotes "'number'".

Given your logic this means that structlog ties the correct logging of log data to the specific Renderer implementation. Doesn't this break the concept of having isolated components which you plug together?

sscherfke commented 3 months ago

The dict of locals which is part of the stacktrace is now a dict mapping names (as strings) to reprs (as strings).

In the older versions, there was an exceptions for strings, so the mapping was not a "mapping of names to reprs" but a "mapping of names to reprs but also maybe strings".

Now it works like this: The repr of a numer is, for example, 42, the repr of an arbitrary object is <object object at 0x1003c47e0> and the repr of a string is 'spam'. So a dict with these three locals values will be {"a": "42", "b": "<object object at 0x1003c47e0>", "c": "'spam'"}.

What the renderers do with this mapping is up to them but the json renderer happens to render it as shown in the example above.

hynek commented 3 months ago

Hmmm, that seems bad though? Firstly, it's a breaking change and secondly we'd have to modify all renderers and our users would have to modify all of their renderers?

sscherfke commented 3 months ago

The to_repr() function is only used by the dict tracebacks and that is only (usually) only used with a JSON renderer. And then only the looks of strings are changed in the "locals" dictionary:

>>> import structlog
>>> structlog.configure([structlog.processors.dict_tracebacks, structlog.processors.JSONRenderer(indent=2)])
>>> log = structlog.get_logger()
>>> a = 42
>>> b = object()
>>> c = "spam"
>>> try: 1 / 0
... except: log.exception("onoes")
... 
{
  "event": "onoes",
  "exception": [
    {
      "exc_type": "ZeroDivisionError",
      "exc_value": "division by zero",
      "syntax_error": null,
      "is_cause": false,
      "frames": [
        {
          "filename": "<stdin>",
          "lineno": 1,
          "name": "<module>",
          "locals": {
            "structlog": "<module 'structlog' from '/Users/stefan/Projects/structlog/src/structlog/__init__.py'>",
            "rich": "<module 'rich' from '/Users/stefan/Projects/structlog/.venv/lib/python3.12/site-packages/rich/__init__.py'>",
            "a": "42",
            "b": "<object object at 0x100d313d0>",
            "c": "'spam'",  # This is the change in question
            "log": "<BoundLoggerLazyProxy(logger=None, wrapper_class=None, processors=None, context_class=None, initial_values={}, logger_factory_args=())>"
          }
        }
      ]
    }
  ]
}

And again, the aim of this change was to safe people using simple secret classes with a def __repr__(self): return "'*******'" from accidentally leaking credentials and other secrets.

@wochinge Is this just a minor annoyance for you or does it really break some log processing in a way?