itamarst / eliot

Eliot: the logging system that tells you *why* it happened
https://eliot.readthedocs.io
Apache License 2.0
1.09k stars 65 forks source link

Allow passing options for orjson.dumps #502

Open MajorDallas opened 2 months ago

MajorDallas commented 2 months ago

My application makes use of SQLAlchemy and eliot. Today, I discovered that sqlalchemy's Table.name is a sqlalchemy.sql.elements.quoted_name, not a str, so I would find destination_failure objects with a TypeError: Dict key must be str error in the log tree when I tried something like this:

@log_call
def foo(orms):
    results = {}
    for orm in orms:
        results[orm.__table__.name] = do_something(orm)
    return results

Since the dictionary shown in the destination_failure appeared to have all str keys, it took some docs-reading and a thorough debugger session to figure out that the exception is from orjson being exceedingly strict about the keys' types by default. It does not rely on the __str__ protocol like most generic code would, but seems to check that the key is exactly str and not some subclass thereof.

A quick test serializing the same dict with orjson.dumps(results, option=orjson.OPT_NON_STR_KEYS) showed that the quoted_name type can be serialized without any further tweaking (assuming the stated performance penalty is acceptable), but there seems to be no way to set this option through eliot.

My proposal would be to add a json_option parameter to destination class initialization. I'm not familiar with Pyrsistent's PClass, so idk if this requires any translation, but eg. for FileDestination:

    def __new__(cls, file, encoder=None, json_default=json_default, json_option=None):  # new kwarg
        ...
        return PClass.__new__(
            cls,
            file=file,
            _dumps=_dumps,
            _linebreak=_linebreak,
            _json_default=json_default,
            _json_option=json_option,  # passing kwarg to parent constructor
        )

    def __call__(self, message):
        self.file.write(
            self._dumps(message, default=self._json_default, option=self.json_option) + self._linebreak
        )  # using the new attribute
        self.file.flush()

Of course the destination type could be subclassed, but, imo, that's quite a lot of work for one extra option that could (should, imo) be baked in. The other generalized workaround is to wrap the destination type and stringify all the keys before passing the message dict to FileDestination, and I will probably do just that in the meantime since I already have a wrapper for something else, but I can't imagine that native Python could do it faster than orjson could--even accounting for performance penalty of OPT_NON_STR_KEYS--and I understood the switch to orjson to be motivated by performance... The last workaround I can think of is to just never use log_call on something that returns a dict whose keys have even a slight chance of not being exactly str, but if that were to be the "official" solution I would ask that it at least be documented.