Closed MiguelAlho closed 4 years ago
I believe I may have found the problem that causes the above behaviour - The repetition of events is related to constantly restarting the read of the buffer file. The following seems to occur:
It's somewhat hard to replicate, but is possible using the sample file and some buffer / book mark files from a production instance. On top of this, I may have fixed a "second message dropped" problem related to the BOM in the buffer file,
I also noticed that some batches may be dropped (and hence the invalid-Error files) when Loggly returns a 502 BadGateway. considering this a transient error, it would be usefull to make the lib retry the messages in this case.
5.1.1-dev00127 does not yet resolve the problem. The loop is still visible during execution.
After the change, we've been able to see this occure again in production.
I've narrowed it down to https://github.com/serilog/serilog-sinks-loggly/blob/dc5b6e2cfa968c31ce6165d6abc303b20a86467b/src/Serilog.Sinks.Loggly/Sinks/Loggly/FileBufferDataProvider.cs#L162
In the context where I caught this, the bookmark is on the second file in the fileset
list; imagine a folder where there are some json files for 2018-11-13, 2018-11-14, 2018-11-15, etc. and for some reason the bookmark's initial position is on the second item (or further ahead in the list). When the library ships all the messages in that file, and tries to move to the next one, it will end up starting on the same one (or a previous) as that line assumes that the current file is the first in the fileset. In this situation (and not sure why yet) it is not, and therefore it get's caught in a loop.
I'm going to propose a defensive fix, for now, that checks the correct position in the list and moves forward, and tries to delete all previous files. This does not remove the need to understand why it's not deleting the previous files (assuming they are already read) in the first place .
What version is this fix available in? This is killing our log volume
@CDargis I've been using the 5.4.0-dev-00175 version in prod for some time now and we have not seen the bad behaviour since.
Might be time to merge this one in.
No available in master / 5.4.0
One behaviour that has popped up infrequently is related to the durable Logger implementation sending the same message frequently / repeatedly to Loggly. This eventually creates a massive accumulation of messages on Loggly and many times quickly maxes out any cap in place.
We've detected the problem through the following symptoms:
We suspect that it is the durable logger implementation we use that some how cannot update the buffer (read messages) correctly and in failing to move forward, stays and repeats the same message(s).
Looking at the buffer.bookmark file, there is a record for a byte pointer to the first buffer.json file (oldest) and many invalid-Error files in the folder.
I'll be looking in to this to fix it since it is causing problems for us, but if anyone has seen this before, any input is appreciated.