lukewaite / logstash-input-cloudwatch-logs

Input plugin for Logstash to stream events from CloudWatch Logs
Other
142 stars 74 forks source link

Feature/per stream high water #92

Closed daniel-bray-sonalake closed 3 years ago

daniel-bray-sonalake commented 3 years ago

A few changes to resolve #74

The issue was that the existing underlying model stored the high water mark at the log group level, e.g.

@sincedb = {
    'logGroup1' => 12345,
    'logGroup2' => 45678
}

However, in a situation where there were a lot of log events across multiple streams in the same log group, this could cause some events to disappear.

The ultimate cause here is that the :interleaved => true parameter in the @cloudwatch.filter_log_events is a best-effort request, and not guaranteed. So, as a result, if you got two streams worth of messages a single log group, they might not get interleaved completely

e.g. given two streams

stream 1 from 26/08/2020 12:03 -> 12:06
stream 2 from 26/08/2020 12:05 -> 12:09

It's possible that the first stream could be loaded before the second, and as a result, when that is queried, we only get events from 12:06 (and so lose a minute's worth of events)

This change was to widen @sincedb to also store a high water mark at the stream level

@sincedb = {
    # the last record for the group
    'logGroup1' => 12345,
    # the last record for the stream
    'logGroup1:streamA' => 12345,
    'logGroup1:streamB' => 12342,
    'logGroup2' => 45678
    'logGroup2:streamC' => 45678,
}

And the algorithm was changed to:

find all the log groups, then for each group:

    get the *earliest* per-stream time for that group
        if the group is new we used the 
            existing logic for the 
            default start time

    @cloudwatch.filter_log_events for the group using this earliest time

    for each event:
        get the event's stream's high water mark
            if the stream is new, then we fall back 
            on the group start time

        ignore the event if it's earlier than the stream's 
            high water mark

        otherwise: process the event as before

The file format still has the old group position entries to allow for the previous high-water mark to be maintained on an upgrade

daniel-bray-sonalake commented 3 years ago

I'm rejecting this in favour of a different approach: https://github.com/lukewaite/logstash-input-cloudwatch-logs/pull/96