warner / foolscap

remote object-messaging for Python+Twisted
http://foolscap.lothar.com/trac
MIT License
51 stars 40 forks source link

saving log events between py2/py3 #68

Open warner opened 4 years ago

warner commented 4 years ago

As mentioned in https://github.com/warner/foolscap/issues/48#issuecomment-570953325 , when a py2-based client emits a log event, the receiver (flogtool tail, log-gatherer, incident-gatherer) gets an event dictionary that uses bytes for both the keys and the values. If the receiver is running py3, the json.dumps() will fail, as it is more picky about the key types than the py2 json module was, and insists that the keys are text (str under py3).

The json module has an override (cls= and implement JSONEncoder.default) for handling non-serializable objects, but this doesn't appear to enable the serialization of bytestring keys. The hook isn't implemented for dictionaries at all (nor any other type that it already knows how to serialize).

So to fix this, I think we'd need a recursive rewriter that takes the dictionary, walks through all collections inside it (dicts, but also lists), and returns a new dict with text keys.

For the sake of rendering, it might also be nice to replace bytes values with text equivalents, as most of the values in log events are boring ASCII strings too.

The wrinkle is that application code can provide additional arguments (yay structured logging), and their values are not necessarily boring ASCII. They could contain nested dictionaries, with arbitrary keys. It's probably fair to insist that log events be serializable, even though part of the intended benefit of structured logging was to let the application author record whatever data would be useful in future debugging, without needing to think about how it should be rendered into text.

(I think we were originally using our own Banana serialization for log events, which was more flexible, but more complicated, and managed to introduce dependencies like the log-viewer had to be able to import the log-emitter's classes, eww)

So if a log event arrives over the wire with bytes keys, we should be ok rewriting them to be strings, and just deny things like hash tables with binary keys. If the values are bytes too, we convert them too, perhaps lossily.