Log prefix prefix prevents parsing of JSON

newrelic / aws-log-ingestion

AWS Serverless Application that sends log data from CloudWatch Logs to New Relic Infrastructure - Cloud Integrations.

Apache License 2.0

50 stars 76 forks source link

Log prefix prefix prevents parsing of JSON #27

Open cuberoot opened 3 years ago

cuberoot commented 3 years ago

I have installed aws-log-ingestion and it is forwarding my logs from CloudWatch to New Relic. So, I have that part working. My issue is that, even when I log valid JSON, New Relic does not parse it because CloudWatch adds on a prefix to the logged string.

So, if I log a JSON string from my AWS Lambda, I get this in New Relic logs 2020-11-03T02:52:20.873Z ca92741f-d3bd-595a-adda-2c051ba23d5a INFO {<JSON Content>} and the JSON isn't parsed.

Is there a way to get around this?

Many thanks for this project!

kolanos commented 3 years ago

@cuberoot Thanks for the report.

First thing to note is the log line prefix you're referring to is coming from AWS and not New Relic. The AWS Lambda runtimes have a default log line format. We recommend that you stick with this format unless you have a good reason not to.

As for parsing the JSON. Where are you expecting this JSON to be parsed? In the aws-log-ingestion function? In New Relic Logs? Or somewhere else? Want to make sure I understand the expected behavior.

ShaunInIdaho commented 3 years ago

Like @cuberoot, I also struggle with this. I would expect to be able to create a custom parsing rule so that the JSON can be parsed at ingestion to New Relic. The challenge I've had is that it doesn't seem possible to parse the JSON using a parsing rule. I can grok up to the JSON, but then what? Is there a way to say, 'start parsing JSON here?

cuberoot commented 3 years ago

I realise that the prefix is an AWS thing. I haven't dug into the details of how aws-log-ingestion works internally, but my naive suggestion would be that aws-log-ingestion might include an option to drop the AWS log prefix when the remainder is JSON. Or, optionally roll the AWS prefix information into the JSON string that is forwarded to New Relic.

The problem is that AWS CloudWatch always includes the prefix and NR doesn't support JSON parsing unless there is no prefix. So, unless NR changes their JSON support, this would have to be worked around at the aws-log-ingestion level.

jmm commented 3 years ago

I came here looking for information about whether there's a way to also populate Events via this integration (where an event would be a log entry with a JSON payload with an eventType property), but this issue is relevant to me as well.

To test populating events into another destination, I already wrote some custom code (that's about 15 lines) to strip the prefix and parse the JSON. This integration wouldn't be useful to me if it doesn't handle this issue, but since I also want the ability to populate events anyway I'm currently going to look at writing my own Lambda to do the integration (and send CloudWatch log subscriptions to it).

On a related note, New Relic's MELT 101: An introduction to the four essential telemetry data types says:

Structured log data makes it easier and faster to search the data and derive events or metrics from the data.

If anyone knows what that means by "derive events", I'd love to know. Is that referring to something you can do within New Relic (convert a log to an event in other words)?

xtagon commented 3 years ago

I'm after the same functionality (getting structured logging from CloudWatch into New Relic and letting New Relic parse the JSON).

CloudWatch says it has some support for structured logging (irrespective of New Relic). I wonder if enabling this would help, or if the prefix would still be present?

https://aws.amazon.com/about-aws/whats-new/2015/01/20/amazon-cloudwatch-logs-json-log-format-support/

Ephesoft-Stitus commented 3 years ago

I was hoping that switching from newrelic-log-ingestion lambda to the new extension in the layer would eliminate the wrapping of our log messages because the messages are sent directly from the Lambda. It turns out, it doesn't. The wrapping must be happening coming out of Lambda before it gets to CloudWatch.

Ephesoft-Stitus commented 3 years ago

According to this AWS document, The Node.js runtime adds a timestamp, request ID, and a log level to each entry logged by the function.

Note that this happens when console.log() is called. However, if you run the lower-level function process.stdout.write(), the wrapper will not be added. This means that the logs will be sent to New Relic as RAW JSON (and stored in CloudWatch that way). This is exactly what I was looking for because now New Relic will parse all of the fields for me automatically.

Note that you'll need to include your own newline character at the end of the message or else the next message will be mashed into the current message. This worked well in my initial testing.

Hope this helps someone else out there.

drexler commented 3 years ago

I've run in to this as well. Using the lower-level function process.stdout.write for log writing i'm unable to ingest the generated CW Logs into NewRelic. It seems the generated prefix is required. The closest that i have gotten is attempting to use a Cloudwatch filter pattern: [timestamp, requestId, logLevel, event] to attempt a rudimentary parser. That only works if the event is not nested JSON.

abu450 commented 3 months ago

I have the same issue. Can someone please help?