Flush retrying (InfluxDB offline) causes duplicate records in InfluxDB

wblakecaldwell commented 9 years ago

The flush(events) method groups events with the same series and column names. This was implemented in a way that modified the input collection. This caused an issue with duplicate records being delivered to InfluxDB because flush(events) is called every N seconds (default 1) with the same collection until the delivery succeeds. Each loop iteration added all of the events' points to the first event with the matching series and column names. By the time flush(events) works - that is, InfluxDB is back online, that request will have a bunch of duplicate records - N*(batch size). If the records don't have "sequence_number" columns, then InfluxDB accepts each duplicate as its own data point, causing a huge spike at the moment that delivery resumes.

https://github.com/logstash-plugins/logstash-output-influxdb/blob/master/lib/logstash/outputs/influxdb.rb#L175-L182

wblakecaldwell commented 9 years ago

Fixed!

Here's the pull request: https://github.com/logstash-plugins/logstash-output-influxdb/pull/15

mikelaws commented 8 years ago

@wblakecaldwell Is it safe to assume that this issue was addressed in the commit by @contentfree on October 23, 2015? It appears that the event collection implementation was completely refactored in the aforementioned commit. If so, @suyograo we might want to also consider clearing #15, but @wblakecaldwell should confirm...

wblakecaldwell commented 8 years ago

I'm not sure. I haven't been in this code in about a year. I don't have anything to test it on at this point. ¯(ツ)/¯

mikelaws commented 8 years ago

Understood :-) I have an environment where I can run some tests. Let me see if I can verify whether we still have an issue and report back.

wblakecaldwell commented 8 years ago

:thumbsup: :)

andrewvc commented 7 years ago

This should be fixed in v 5.0.0 Thanks to #55. I'm going to close this issue given that. If you continue to encounter an issues please open a new issue. Thanks :)

logstash-plugins / logstash-output-influxdb

Flush retrying (InfluxDB offline) causes duplicate records in InfluxDB #14