Provide Option to Truncate Attribute Values that Exceed NR Length Limit

chase-miller commented 3 years ago

Is your feature request related to a problem? Please describe.

New Relic's logging capability is unreliable from a user's perspective when messages aren't parsed (https://docs.newrelic.com/docs/logs/log-management/troubleshooting/log-message-truncated/). I would much rather have truncated, structured data than no structured data at all. Otherwise searches result in false negatives (results are erroneously omitted).

Feature Description

Provide an option when calling .Enrich.WithNewRelicLogsInContext() to truncate all attribute values that violate New Relic's field length limit of 4096 characters.

Describe Alternatives

I'm guessing that this would probably be easier if New Relic had a sink instead of an enricher, but hopefully not. Perhaps this would be better placed elsewhere in Serilog's configuration, I'd argue it's best placed in New Relic's Serilog integration due to it being a New Relic-specific issue (albeit also prevalent across other logging providers).

Priority

Please help us better understand this feature request by choosing a priority from the following options: [Really Want]

nr-ahemsath commented 3 years ago

Hi @chase-miller, thanks for bringing this to our attention. At first glance, this definitely seems like a solvable problem.

I am confused, however: your problem description (and feature description) imply that when you log a message in Serilog with the New Relic enricher and formatter, and then forward the resulting JSON data to New Relic using one of our supported forwarding mechanisms (e.g. fluentd), that any attributes greater than 4096 chars in length will not show up in the NR logs UI. However, the NR doc you linked (https://docs.newrelic.com/docs/logs/log-management/troubleshooting/log-message-truncated/) says that "Any data longer than [4096 chars] is truncated during ingestion." and implies that the partial data <=4096 chars in length will be shown with an ellipsis (...) at the end. So it seems like if our doc is correct, the backend already does what you're asking our library in this repo to do. If our doc is incorrect, we should address that as well as implement the feature you're asking for.

(Or maybe I'm not understanding the problem, in which case I apologize in advance!)

If we do need to modify the behavior of our library, I think we can most easily do it in our formatter class instead of the enricher: https://github.com/newrelic/newrelic-logenricher-dotnet/blob/a016be5088944a0c33a4cd3a47ebdfa9ee21d6a2/src/NewRelic.LogEnrichers.Serilog/NewRelicFormatter.cs#L186 And my first instinct is that truncating fields to the 4096 maximum length should be the default behavior rather than something required to be configured since the whole point of the New Relic formatter for Serilog is to produce log output in the JSON format needed by the NR backend.

chase-miller commented 3 years ago

@nr-ahemsath thanks for getting back so quickly!

I perused the code after opening this issue, and I agree that the formatter makes the most sense. I'm currently working on a wrapper around NewRelicFormatter to do this before it hits your code, but it would be way more convenient if it were already done ;-). Also I'm not sure if it's even possible; my fallback is to copy NewRelicForamtter.cs and modify it. Modifying the method you pointed out makes the most sense to me.

partial data <=4096 chars in length will be shown with an ellipsis (...) This is the behavior that's undesirable to me. I could be wrong, but it appears that if one property within a log message exceeds 4096 characters, then the entire log message doesn't get parsed. I'd much rather truncate the offending property (e.g. with ellipsis) and have the log message parsed.

Does that makes sense? I might not understand how this works by the way ;-)

chase-miller commented 3 years ago

Also, check out this convo and the code he links to for an example of how the seq sink is doing something similar.

nr-ahemsath commented 3 years ago

We should probably manually test the current behavior to see what it is before doing a fix (and we'd need to manually test after the fix anyway).

I'll move this issue into our backlog. Since we have a small team and we maintain multiple products, it might be a while before we can get this issue added to a milestone. If you end up modifying your own NewRelicFormatter and that works for you, we'd welcome a PR with your changes.

chase-miller commented 3 years ago

@nr-ahemsath sorry for the slow response. I was able to get this working and discovered some strange New Relic behavior along the way.

Unfortunately I'm not able to provide much specific code, but I'll provide some snippets and pseudocode. I suspect that this solution isn't as efficient as it could be if the work was done either upstream in Serilog's core or downstream in NewRelicFormatter.

Example usage:

var formatter = new NewRelicFormatter()
    // Traverses a given Serilog's LogEvent property tree and checks each ending (ScalarValue) node for violations.
    // If it finds any, it truncates them and prepends a message indicating that the property has been truncated.
    // Then it will call the wrapped ITextFormatter to actually format the message.
    .TruncatingPropertiesExceedingCharacterLengthLimit(4096)
    // New Relic doesn't log arrays as objects. Instead, it logs it as a string - https://docs.newrelic.com/docs/logs/new-relic-logs/troubleshooting/json-message-not-parsed/. Ensure the string doesn't exceed the length limit, and truncate if it does.
    .WithNewRelicSequenceValueHandler()
    // There seems to be a bug in New Relic where numbers with more than 16 digits have erroneous values added to the end. Numbers with more than 18 digits result in entire log messages not being parsed at all. Convert numbers that exceed this limit to a string, truncate if necessary, and prepend a warning.
    .WithNewRelicNumberProtectionHandler()
    // Hacks the output and replaces the `message` property that NewRelicFormatter adds.
    .WithNewRelicMessageReplacement()
    // If an exception is passed to Serilog's logger, its message length is checked and truncated if necessary (this happens before the NewRelicFormatter is hit).
    .AndAlsoTruncatingExceptionMessages();

The first extension method creates a new ITextFormatter instance that wraps another ITextFormatter (in this case, the NewRelicFormatter).

This is the meat of the formatter. Its returned object is then supplied in a call to the wrapped ITextFormatter's Format() method.

private LogEventProperty TruncateIfNecessary(LogEventProperty logEvent)
{
    if (TryHandleViaCustomHandler(out var customHandlerResult))
    {
        return customHandlerResult;
    }

    var safePropertyValue = logEvent.Value switch
    {
        StructureValue asStructureValue => HandleStructureValue(asStructureValue),
        DictionaryValue asDictionaryValue => HandleDictionaryValue(asDictionaryValue),
        SequenceValue asSequenceValue => HandleSequenceValue(asSequenceValue),
        ScalarValue asScalarValue => TruncateScalarValueIfNecessary(asScalarValue),
        _ => logEvent.Value
    };

    return new LogEventProperty(logEvent.Name, safePropertyValue);
}

TruncateIfNecessary is called recursively by the first three cases in the switch statement; the ScalarValue case is where the truncation happens if needed.

Recommended changes to this lib

The subsequent extension methods leverage hooks exposed in the ITextFormatter instance I mentioned. If nothing else, it would be great for this lib to be updated so that the last two extension methods aren't needed. Specifically:

Prevent the message property the NewRelicFormatter adds from exceeding length limitations imposed by New Relic.
Prevent the error.message property the NewRelicFormatter adds from exceeding length limitations imposed by New Relic.
Do the same with any properties NewRelicFormatter adds.

I'm not sure if simply updating the NewRelicFormatter.WriteFormattedJsonData methods like we discussed previously will be enough since they're only taking into account top-level properties. If length checking / truncation were only done on top-level properties, then the lib might detect a violation and truncate something that the NR service wouldn't find in violation. A good example is an object with two properties where one violates a limit but the other does not. My understanding is that the NR service treats each property independently, so only the latter one needs to be truncated (and not the entire object).

All that said, it would be great if the lib either baked in a solution or provided a recommendation in documentation to something like the one that I came up with. At the end of the day what matters is that log messages aren't omitted when searching on NR data.

There are three additional issues / observations I've discovered along the way:

I'm planning to open a bug via New Relic support (unless you can do this) regarding the number of issues I mentioned above. If that's fixed, then that extension method is no longer needed.

There's another bug I haven't quite nailed down yet, but it appears that if the total log message exceeds a certain size it won't be parsed by NR and will be split across multiple NR log messages. It's possible this is happening upstream from NR (i.e. in Fluent Bit), but I'm not sure.

I won't touch on the way NR handles arrays other than to say it feels...weird :-)

workato-integration[bot] commented 2 years ago

https://issues.newrelic.com/browse/NEWRELIC-3305

newrelic / newrelic-logenricher-dotnet