Closed wblakecaldwell closed 7 years ago
Fixed!
Here's the pull request: https://github.com/logstash-plugins/logstash-output-influxdb/pull/15
@wblakecaldwell Is it safe to assume that this issue was addressed in the commit by @contentfree on October 23, 2015? It appears that the event collection implementation was completely refactored in the aforementioned commit. If so, @suyograo we might want to also consider clearing #15, but @wblakecaldwell should confirm...
I'm not sure. I haven't been in this code in about a year. I don't have anything to test it on at this point. ¯(ツ)/¯
Understood :-) I have an environment where I can run some tests. Let me see if I can verify whether we still have an issue and report back.
:thumbsup: :)
This should be fixed in v 5.0.0 Thanks to #55. I'm going to close this issue given that. If you continue to encounter an issues please open a new issue. Thanks :)
The flush(events) method groups events with the same series and column names. This was implemented in a way that modified the input collection. This caused an issue with duplicate records being delivered to InfluxDB because flush(events) is called every N seconds (default 1) with the same collection until the delivery succeeds. Each loop iteration added all of the events' points to the first event with the matching series and column names. By the time flush(events) works - that is, InfluxDB is back online, that request will have a bunch of duplicate records - N*(batch size). If the records don't have "sequence_number" columns, then InfluxDB accepts each duplicate as its own data point, causing a huge spike at the moment that delivery resumes.
https://github.com/logstash-plugins/logstash-output-influxdb/blob/master/lib/logstash/outputs/influxdb.rb#L175-L182