serilog / serilog

Simple .NET logging with fully-structured events
https://serilog.net
Apache License 2.0
7.26k stars 798 forks source link

Top-level byte array isn't destructured, but other byte sequences are #1954

Open cmeeren opened 1 year ago

cmeeren commented 1 year ago

Repro:

open System.Collections.Generic
open Serilog
open Serilog.Core

type TestDestructuringPolicy() =
    interface IDestructuringPolicy with
        member this.TryDestructure(_value, propertyValueFactory, result) =
            result <- propertyValueFactory.CreatePropertyValue("TEST")
            true

Log.Logger <-
    LoggerConfiguration()
        .Destructure.With(TestDestructuringPolicy())
        .WriteTo.Console()
        .CreateLogger()

Log.Information("Byte list: {@Data}", List([ 1uy ]))
Log.Information("Byte array: {@Data}", [| 1uy |])

Log.CloseAndFlush()

Expected output:

[10:49:50 INF] Byte list: TEST
[10:49:50 INF] Byte array: TEST

Actual output:

[10:49:50 INF] Byte list: TEST
[10:49:50 INF] Byte array: 01

Top-level byte arrays aren't passed to my destructuring policy. This seems inconsistent. Could this inconsistency be fixed, so that byte arrays are passed to the destructuring policy?

nblumhardt commented 1 year ago

I think your second statement:

Log.Information("Byte array: {@Data}", [| 1uy |])

isn't calling the overload of Information() that you're expecting - its first arg is a params array.

Let me know if this helps :-)

cmeeren commented 1 year ago

No, that's not it. Here is a modification to the repro that uses multiple values, thereby avoiding that ambiguity. The output is the same as the original example.

Log.Information("Byte {Kind}: {@Data}", "list", List([ 1uy ]))
Log.Information("Byte {Kind}: {@Data}", "array", [| 1uy |])
nblumhardt commented 1 year ago

🤔 hmmmmmm... I think then it will be that byte arrays are processed as scalars (converted to hex strings):

https://github.com/serilog/serilog/blob/dev/src/Serilog/Capturing/PropertyValueConverter.cs#L68

but other enumerable types (FSharp.Core.List?) don't get that special treatment and pass through:

https://github.com/serilog/serilog/blob/dev/src/Serilog/Capturing/PropertyValueConverter.cs#L174

I don't think it's likely to be a bug per se, just the outcome of different types being processed differently - not sure whether/where we'd make any changes to enable a different approach, but if you're able to share some more info about your scenario/preferences it'd be good data to have. Thanks!

cmeeren commented 1 year ago

FSharp.Core.List?

The repro uses System.Collections.Generic.List (typically aliased as ResizeArray in F#), though that's irrelevant. The only requirement is that it is not an array.

if you're able to share some more info about your scenario/preferences it'd be good data to have.

Certainly!

Destructuring per se doesn't help me, so I don't use that. In fact, it would actively hurt me, at least in the first case below.

In both of these cases, the only thing that matter to me is the log message. I therefore simply want the rendered log message to be as nicely formatted as possible.

To achieve this, I am serializing most values using custom settings, so they are easy to read. (I used JSON first, but recently switched to YAML because it's even easier to read.)

I use this not only for complex objects, but also single-case discriminated unions ("wrapped primitives"), which are unwrapped using my serialization settings. In general, no matter what I serialize using my "log JSON setting", the output is either better or the same, never worse.

Of course, relative to other performance costs of logging, serialization is a heavy hitter. I used to serialize synchronously and insert the string directly into the log message (Log.Information("Foo {Bar}", logSerialize myObject)), but this is wasteful if the log level is not enabled (and in many cases, I logged complex objects at a normally disabled log level). I found out that I could make use of a custom destructuring policy and instead use the destructure operator @, which meant that the objects would not be serialized if the log level was not enabled. Also, it's less verbose. So (Log.Information("Foo {@Bar}", myObject) would be equivalent to the previous statement, but only serialize if the log level was enabled.

I have not found any other way to accomplish 1) nice custom-formatted complex objects in log messages, and 2) not serializing for disabled log levels.

In terms of this issue, the point is that this destructuring-based approach works wonderfully for everything except byte arrays. Specifically, I'd like to represent them with spaces between the bytes, like FA 2E C9, because that makes it easier for me to know "what is byte 5", and easier to paste into other tools we use that accept this format. For byte lists and other byte sequences, it works fine, but the different handling of byte arrays means it didn't work there. This was surprising to me, because there is not just one accepted way to render byte arrays.

nblumhardt commented 1 year ago

Thanks for taking the time to write that! 👍

I've never been entirely comfortable with our opinionated byte array formatting - the current system came about though because treating byte arrays as lists of bytes was ...... problematic to say the least 😅 ... so something (useful) needed to be done.

Unfortunately, byte arrays are something of an anomaly; most/all "scalar" types Serilog cares about are structs/value objects, which have copy semantics and so can be passed as-is through to the asynchronous side of the logging pipeline, or in the case of string, are immutable. Because of this there isn't a great extension point for dealing with byte arrays.

In the past we've considered making IScalarConversionPolicy a public/pluggable extension point, along the same lines as IDestructuringPolicy, which would potentially sort out your scenario, but it's a nontrival bit of infrastructure to design and maintain without a broadly-relevant set of use cases. This could be something to explore more deeply but probably requires quite a lot of effort.

Depending on how much control you have on the application side, constructing a proxy type such as:

record struct ByteArrayProxy(byte[] Bytes);

Log.Information("Some {Data}", new ByteArrayProxy(myBytes));

and then targeting it with an IDestructuringPolicy would be a possible workaround. (Forgive the C# syntax :-))

Hope this helps.

cmeeren commented 1 year ago

Thanks for the clarification!

I am indeed wrapping the byte array as a workaround (in the Some case of F#'s option type, which using my serialization settings is unwrapped).

In any case, what I really want is to be able to control how Serilog formats values of arbitrary types. The fact that I'm using destructuring is just because that's the only way I found. If it would be possible to add a way to let users specify formatting of arbitrary types (e.g., pass obj and let the user decide how to transform/stringify it), then that would solve my problem in a less hacky way, would allow me to use destructuring for actual destructuring (if I ever need that), and would also mean I don't have to remember to use @ everywhere (easy to forget, leading to bad output, and I don't see that before I actually see an example of the relevant log message, which is in some ways the worst kind of error).

nblumhardt commented 1 year ago

The reason to use destructuring or enrichment is that Serilog does capturing on the foreground thread (at the point the event is logged), but formatting on a (potentially) background thread. By the time the data reaches the formatting stage it's already been snapshotted; this is how mutable things like arrays end up logged with the right contents (and race conditions/threading bugs are avoided). HTH!

cmeeren commented 1 year ago

Thanks, though I'm not sure what you intend for me to do with that information?

nblumhardt commented 1 year ago

Just filling in the picture, unfortunately it doesn't get us closer to a solution, just narrows down the possible places to investigate :-)