serilog-archive / serilog-sinks-loggly

A Serilog event sink that writes to Loggly
Apache License 2.0
27 stars 30 forks source link

Drop invalid messages #17

Closed MiguelAlho closed 7 years ago

MiguelAlho commented 7 years ago

We noticed some miss behavior with the durable logging feature.

For some reason, we came across a situation in which the same set of messages where repeatedly set to Loggly. The exact cause has not been determined but we are sure it is related to the durable logging feature - the messages being sent where the same set in the first buffer file on the misbehaving machine. We also noticed there was a single "invalid-*" file with the same set of messages that we identified where repeated. This was making our account pass the daily message cap from our Loggly service plan.

During the debugging session, using the same set of files, an error was occurring when resuming the read operation of the buffer file: the position counter in the bookmark file was NOT at the start of the line that was to be read. It was a couple of bytes off (forward), which might have been related to a (misinterpreted) BOM marker. UTF8 is passed into the RollingFileSink for writing messages and UTF8 encoding is used to read the messages in the buffer file, though they are not the same instance of encoders. I suspected that, for some reason, one could be considering the BOM and the other operation not, so I decided to use the Encoding constructor and pass the instance into each of the classes (HttpLogShipper for reading and RollingFileSink for writing).

Even with this in place, it wasn't possible to recover from the error. Every time the read operation was run, there was an attempt to read an incomplete line from the file. Since the JSON contained in it was incomplete, it could not be deserialized. The error in the deserialization process bubbled up past the point where the bookmark would be updated, so at each cycle, there was an attempt to read the same line incorrectly.

To avoid this, the PR includes message dropping for incorrectly formatted JSON messages. If the line is read incorrectly, or fails the deserialization, the message is dropped and the marker moves forward to the correct position. The next line can then be read and deserialized correctly.

This PR also includes an argument to control the RollingFile retention limit, requested in #13

nblumhardt commented 7 years ago

Hi Miguel - this issue looks like https://github.com/serilog/serilog-sinks-seq/issues/28 - worth checking out the discussion and PR there, seems like it might be closer to the root cause.

MiguelAlho commented 7 years ago

from https://github.com/serilog/serilog-sinks-file/pull/10/commits/c723d182b651f45a4089dd4a033c76eaa0c110b9#diff-ef4cfbd08eaf68d8efda69b45c4da12d - the default encoding is removing the BOM by default, correct? This PR is forcing an encoding with the BOM marker on, but I'll change it to align with the FileSink and Seq sink and change it to disable the BOM marker.

nblumhardt commented 7 years ago

Sorry to not have much useful input on this - looks like the issue I was thinking of has already been avoided here. Stuck for time, so if the solution fixes the problem you've observed, +1 from me.