influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.57k stars 5.57k forks source link

Rate limit multiple batch sends #14802

Open powersj opened 8 months ago

powersj commented 8 months ago

Use Case

In the case or scenario where an output goes down for a little bit, the buffer will accumulate metrics. Once connectivity is restored to the output, then the output could get hit essentially a mini-DOS of batches from telegraf. Telegraf will send multiple batches immediately, rather than gently sending them.

Expected behavior

Opt-in configuration option to allow turning rate limiting on.

Actual behavior

Lots of messages sent at once, for example from an old issue:

2021-09-21T20:10:30Z D! [outputs.influxdb_v2] Wrote batch of 1500 metrics in 40.980832ms
2021-09-21T20:10:30Z D! [outputs.influxdb_v2] Wrote batch of 1500 metrics in 28.129692ms
2021-09-21T20:10:30Z D! [outputs.influxdb_v2] Wrote batch of 1500 metrics in 25.840982ms
2021-09-21T20:10:30Z D! [outputs.influxdb_v2] Wrote batch of 1500 metrics in 30.648302ms
2021-09-21T20:10:31Z D! [outputs.influxdb_v2] Wrote batch of 1500 metrics in 35.806018ms
2021-09-21T20:10:31Z D! [outputs.influxdb_v2] Wrote batch of 1500 metrics in 68.213211ms
2021-09-21T20:10:31Z D! [outputs.influxdb_v2] Wrote batch of 1500 metrics in 22.580247ms
2021-09-21T20:10:31Z D! [outputs.influxdb_v2] Wrote batch of 1500 metrics in 20.719449ms
2021-09-21T20:10:31Z D! [outputs.influxdb_v2] Wrote batch of 500 metrics in 33.120372ms
2021-09-21T20:10:31Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 50000 metrics

Additional info

Go includes a rate limiting package: https://pkg.go.dev/golang.org/x/time/rate that we could make use of in the flushLoop or in the flushBatch functions.

powersj commented 7 months ago

next steps: look into this