trentm / node-bunyan

a simple and fast JSON logging module for node.js services
Other
7.17k stars 515 forks source link

Filtering of duplicate log lines #443

Open bripkens opened 7 years ago

bripkens commented 7 years ago

Sometimes, especially when processing a large number of events / requests, applications go wild and start logging the same stuff over and over again. It is desirable to filter duplicate log messages, so as to avoid filling up hard drives or to avoid exceeding log aggregation limits.

Other logging libs, e.g. logback in the Java world, have support for this via filters. Specifically for logback, it is called a DuplicateMessageFilter.

From the documentation:

[…] This filter detects duplicate messages, and beyond a certain number of repetitions, drops repeated messages.

Does something like this already exist for bunyan? If not, would you be inclined to support something like this in the core?

trentm commented 7 years ago

I don't know of something like this for bunyan yet. https://github.com/trentm/node-bunyan/issues/115 is a similar feature request.

You could somewhat do this with a 'raw' Bunyan stream (example: https://github.com/trentm/node-bunyan/blob/master/examples/raw-stream.js). But really what I'd like to do is have a 'filters' concept for Bunyan, independent of the output streams, so that one can more cleanly implement filtering of log records without mucking directly with the output stream details. Anyway, that's vapour now.

Either implementation would have to decide what makes a duplicate log record. Given that you have many and nested objects in a Bunyan log record, how would you implement this? Do you have some examples of duplicate records from your usage? I suppose the app could configure this to look at a specific subset of record fields.

bripkens commented 7 years ago

Either implementation would have to decide what makes a duplicate log record. Given that you have many and nested objects in a Bunyan log record, how would you implement this?

There is some trade-off here between how accurate duplicate log message identification should be and how much it is allowed to cost. In the case of logback's DuplicateMessageFilter, this filtering happens purely on the log message's format string. Up until now, this has been sufficient for us. For instance, the following log messages would be considered equal by logback.

const message = 'Something went wrong: %s';
logger.info(message, 'database access failed');
logger.warn(message, 'out of disk space');

As mentioned above, this has been sufficient for us and protects us from accidentally exceeding logging limits in a matter of minutes.

But really what I'd like to do is have a 'filters' concept for Bunyan, independent of the output streams, so that one can more cleanly implement filtering of log records without mucking directly with the output stream details.

[…] Given that you have many and nested objects in a Bunyan log record, how would you implement this? […]

I think you nailed it already. Custom filters, i.e. a filter function which is invoked before messages are formatted and downstreamed, could be as complex or as simple as necessary. Implementing a filter similar to logback's DuplicateMessageFilter would then be very easy I assume.

bripkens commented 7 years ago

I'll try to add the behavior as discussed above to bunyan :).