segmentio / ecs-logs

Log forwarder for services ran by ecs-agent.
MIT License
114 stars 16 forks source link

CloudWatch: DataAlreadyAcceptedException #68

Closed bobzoller closed 5 years ago

bobzoller commented 6 years ago

Occasionally we get a DataAlreadyAcceptedException when writing to CloudWatch. When we see this error, it is followed by some number of the writer was invalidated by another goroutine, each of which represents some number of dropped log lines.

Until now, it was happening infrequently enough (could go weeks without an occurrence) that we didn't worry about it. In the last 5 days, however, it's been consistently happening 10-30 times per day (total across all our log streams), so now I'm digging into it.

Can anyone help me understand:

It looks like this error could potentially be handled gracefully in the same way InvalidSequenceTokenException is already, because the error message includes a new token:

DataAlreadyAcceptedException: The given batch of log events has already been accepted. The next batch can be sent with sequenceToken: 49586195678550359962035101545865519144132306138323654946
bobzoller commented 6 years ago

I should mention we're running 1b0bfee9b6c0357c902141673f3a2050d63dff90 (so do not have the fixes in #67 or #65)

bobzoller commented 6 years ago

I think amekkawi/cwlogs-writable#10 confirmed my theory and I'm feeling like I will just take a pass at a PR.