Fix Issue #14: Flush retries send duplicates to InfluxDB

wblakecaldwell commented 9 years ago

The flush(events) method groups events with the same series and column names. This was implemented in a way that modified the input collection. This caused an issue with duplicate records being delivered to InfluxDB because flush(events) is called every N seconds (default 1) with the same collection until the delivery succeeds. Each loop iteration added all of the events' points to the first event with the matching series and column names. By the time flush(events) works - that is, InfluxDB is back online, that request will have a bunch of duplicate records - N*(batch size). If the records don't have "sequence_number" columns, then InfluxDB accepts each duplicate as its own data point, causing a huge spike at the moment that delivery resumes.

The above problem is now fixed. When we're adding to event_collection, rather than add the original event, we create a shallow copy, so the event['points'] collection isn't the same instance as the event's, and adding to it doesn't modify the input events.

wblakecaldwell commented 9 years ago

I just signed the CLA

elasticsearch-release commented 9 years ago

Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run; then say 'jenkins, test it'.

wblakecaldwell commented 9 years ago

Not missing the CLA - I signed it 3 months ago

suyograo commented 8 years ago

@wblakecaldwell apologies for the delay. Can you rebase this PR? will merge after

andrewvc commented 7 years ago

This should be fixed in v 5.0.0 Thanks to #55. I'm going to close this issue given that. If you continue to encounter an issues please open a new issue. Thanks :)

logstash-plugins / logstash-output-influxdb

Fix Issue #14: Flush retries send duplicates to InfluxDB #15