theogravity / pino-datadog-transport

A pino v7+ transport for sending logs to Datadog
MIT License
14 stars 4 forks source link

Question: This package vs. logging to file and having Datadog agent tail the file #14

Open doronrk opened 1 year ago

doronrk commented 1 year ago

The Node.js Log Collection Documentation from Datadog says:

To send your logs to Datadog, log to a file and tail that file with your Datadog Agent

You can stream your logs from your application to Datadog without installing an Agent on your host. However, it is recommended that you use an Agent to forward your logs as it provides a native connection management.

I know that the documentation is written about Winston, but it seems like the guidance would still apply regardless of the logging library used.

What is the difference between using this transport vs. logging to a file and tailing that file with the Datadog Agent? Is this package strictly useful in an environment without the Datadog Agent, or are there benefits to this transport that make it worth using even when the Datadog Agent is available?

Thanks!

doronrk commented 1 year ago

Because we are on Heroku, a 3rd option seems to be available to us - using Heroku's logplex:

https://docs.datadoghq.com/logs/guide/collect-heroku-logs/

I remain curious about my original question though.

theogravity commented 1 year ago

Personally, I'd use the datadog agent + writing to a logfile, but that also means you have to install the agent, and manage logfile rotation. The agent will probably do a better job than the plugin will. The agent will require more configuration compared to the plugin too.

I think the default behavior is to read from stdout / stderror for the agent vs a logfile if you use a containerized agent, and doing stdout / stderror actually can cause memory issues as node will take time to do garbage collection of the log output at random intervals. In a heavy production environment that generates log entries quickly, you can find yourself with OOM errors.

https://javascript.plainenglish.io/can-console-log-cause-memory-leaks-how-to-make-a-browser-crash-with-console-log-b94e4d248ed8

The difference really is you don't have to install the agent when using this plugin. It is really for if you don't have the ability to install the agent and/or write to a logfile.

ramonsnir commented 1 year ago

Another advantage of this library (which I only recently discovered): some log-tailers have edge-cases in parsing logs (before it even hits the Datadog logs pipeline). In our case, fluentd broke up very long log lines (from around 20kB, and we have log lines that are 1MB+). After a long discussion with Datadog technical support, we agreed with them that this library's approach could be more consistent in collecting logs. They also assured us that the Datadog intake API practically has no rate limits or throughput limitations, so there's no risk here.

We've been running with this for a very short while. So far, it didn't have any downsides compared to process.stdout => firelens => fluentd, which is the "documented" approach on Datadog's website, and works around every issue we have observed so far.

theogravity commented 1 year ago

Thanks for the feedback! Although you said it "has no rate limits or throughput limitations", do you also mean payload size? I found the 1MB limit from the agent docs when I made the plugin. Since you say you have 1MB+ log lines, are you having an issue sending logs of that size using it? I don't hard fail when it reaches 1MB+ (it still tries to send out), but I do print an error to the error callback if you have it defined.

ramonsnir commented 1 year ago

What I originally meant, and what I asked them about, was that if you send a million requests per second (each one of them valid on its own), all of their contents will be ingested. So switching to this approach is scalable, as long as Node.js and this library are able to push the data out.

I have seen logs that are 1MB+ in Datadog's index, but to be fair that's the size after the whole Datadog pipeline runs. The raw data leaving Node.js might have been anywhere from 0.5MB to 1.5MB, I can't tell without looking more into it.

theogravity commented 1 year ago

The limit comes from here, maybe after compression it shrinks down below the limit?

https://docs.datadoghq.com/api/latest/logs/