serilog-archive / serilog-sinks-loggly

A Serilog event sink that writes to Loggly
Apache License 2.0
27 stars 30 forks source link

Durrable Logger sends same message(s) repeatedly #25

Closed MiguelAlho closed 4 years ago

MiguelAlho commented 7 years ago

One behaviour that has popped up infrequently is related to the durable Logger implementation sending the same message frequently / repeatedly to Loggly. This eventually creates a massive accumulation of messages on Loggly and many times quickly maxes out any cap in place.

We've detected the problem through the following symptoms:

We suspect that it is the durable logger implementation we use that some how cannot update the buffer (read messages) correctly and in failing to move forward, stays and repeats the same message(s).

Looking at the buffer.bookmark file, there is a record for a byte pointer to the first buffer.json file (oldest) and many invalid-Error files in the folder.

I'll be looking in to this to fix it since it is causing problems for us, but if anyone has seen this before, any input is appreciated.

MiguelAlho commented 7 years ago

I believe I may have found the problem that causes the above behaviour - The repetition of events is related to constantly restarting the read of the buffer file. The following seems to occur:

It's somewhat hard to replicate, but is possible using the sample file and some buffer / book mark files from a production instance. On top of this, I may have fixed a "second message dropped" problem related to the BOM in the buffer file,

MiguelAlho commented 7 years ago

I also noticed that some batches may be dropped (and hence the invalid-Error files) when Loggly returns a 502 BadGateway. considering this a transient error, it would be usefull to make the lib retry the messages in this case.

MiguelAlho commented 7 years ago

5.1.1-dev00127 does not yet resolve the problem. The loop is still visible during execution.

MiguelAlho commented 5 years ago

After the change, we've been able to see this occure again in production.

I've narrowed it down to https://github.com/serilog/serilog-sinks-loggly/blob/dc5b6e2cfa968c31ce6165d6abc303b20a86467b/src/Serilog.Sinks.Loggly/Sinks/Loggly/FileBufferDataProvider.cs#L162

In the context where I caught this, the bookmark is on the second file in the fileset list; imagine a folder where there are some json files for 2018-11-13, 2018-11-14, 2018-11-15, etc. and for some reason the bookmark's initial position is on the second item (or further ahead in the list). When the library ships all the messages in that file, and tries to move to the next one, it will end up starting on the same one (or a previous) as that line assumes that the current file is the first in the fileset. In this situation (and not sure why yet) it is not, and therefore it get's caught in a loop.

I'm going to propose a defensive fix, for now, that checks the correct position in the list and moves forward, and tries to delete all previous files. This does not remove the need to understand why it's not deleting the previous files (assuming they are already read) in the first place .

CDargis commented 4 years ago

What version is this fix available in? This is killing our log volume

MiguelAlho commented 4 years ago

@CDargis I've been using the 5.4.0-dev-00175 version in prod for some time now and we have not seen the bad behaviour since.

Might be time to merge this one in.

MiguelAlho commented 4 years ago

No available in master / 5.4.0

https://www.nuget.org/packages/Serilog.Sinks.Loggly/5.4.0