Closed wblakecaldwell closed 7 years ago
I just signed the CLA
Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run; then say 'jenkins, test it'.
Not missing the CLA - I signed it 3 months ago
@wblakecaldwell apologies for the delay. Can you rebase this PR? will merge after
This should be fixed in v 5.0.0 Thanks to #55. I'm going to close this issue given that. If you continue to encounter an issues please open a new issue. Thanks :)
The flush(events) method groups events with the same series and column names. This was implemented in a way that modified the input collection. This caused an issue with duplicate records being delivered to InfluxDB because flush(events) is called every N seconds (default 1) with the same collection until the delivery succeeds. Each loop iteration added all of the events' points to the first event with the matching series and column names. By the time flush(events) works - that is, InfluxDB is back online, that request will have a bunch of duplicate records - N*(batch size). If the records don't have "sequence_number" columns, then InfluxDB accepts each duplicate as its own data point, causing a huge spike at the moment that delivery resumes.
The above problem is now fixed. When we're adding to event_collection, rather than add the original event, we create a shallow copy, so the event['points'] collection isn't the same instance as the event's, and adding to it doesn't modify the input events.