However, in a situation where there were a lot of log events across multiple streams in the same log group, this could cause some events to disappear.
The ultimate cause here is that the :interleaved => true parameter in the @cloudwatch.filter_log_events is a best-effort request, and not guaranteed. So, as a result, if you got two streams worth of messages a single log group, they might not get interleaved completely
e.g. given two streams
stream 1 from 26/08/2020 12:03 -> 12:06
stream 2 from 26/08/2020 12:05 -> 12:09
It's possible that the first stream could be loaded before the second, and as a result, when that is queried, we only get events from 12:06 (and so lose a minute's worth of events)
This change was to widen @sincedb to also store a high water mark at the stream level
@sincedb = {
# the last record for the group
'logGroup1' => 12345,
# the last record for the stream
'logGroup1:streamA' => 12345,
'logGroup1:streamB' => 12342,
'logGroup2' => 45678
'logGroup2:streamC' => 45678,
}
And the algorithm was changed to:
find all the log groups, then for each group:
get the *earliest* per-stream time for that group
if the group is new we used the
existing logic for the
default start time
@cloudwatch.filter_log_events for the group using this earliest time
for each event:
get the event's stream's high water mark
if the stream is new, then we fall back
on the group start time
ignore the event if it's earlier than the stream's
high water mark
otherwise: process the event as before
The file format still has the old group position entries to allow for the previous high-water mark to be maintained on an upgrade
A few changes to resolve #74
The issue was that the existing underlying model stored the high water mark at the log group level, e.g.
However, in a situation where there were a lot of log events across multiple streams in the same log group, this could cause some events to disappear.
The ultimate cause here is that the
:interleaved => true
parameter in the@cloudwatch.filter_log_events
is a best-effort request, and not guaranteed. So, as a result, if you got two streams worth of messages a single log group, they might not get interleaved completelye.g. given two streams
It's possible that the first stream could be loaded before the second, and as a result, when that is queried, we only get events from 12:06 (and so lose a minute's worth of events)
This change was to widen @sincedb to also store a high water mark at the stream level
And the algorithm was changed to:
The file format still has the old
group position
entries to allow for the previous high-water mark to be maintained on an upgrade