Open JeffAshton opened 3 years ago
@JeffAshton,
It looks like you completed a PR via #8817, which says it is part 1. Was there an additional PR that went in such that this is now resolved?
Thanks!
@JeffAshton,
It looks like you completed a PR via #8817, which says it is part 1. Was there an additional PR that went in such that this is now resolved?
Thanks!
I actually ended up just writing my own output plugin:
https://github.com/Brightspace/telegraf/tree/master/plugins/outputs/d2l_kinesis
I wanted to gzip multiple metrics into a single kinesis record. Ended up reducing our kinesis shards drastically and saved a bit of money. Trying to maintain backwards compatibility with options like the partition method complicated things beyond my needs.
The cost of a single metric per kinesis record is actually cost prohibitive from my experience.
Thanks for the update and it is interesting to hear about the costs.
In terms of this bug, and silently dropping metrics, was this at least resolved?
Thanks for the update and it is interesting to hear about the costs.
I addressed it in my output plugin.
In terms of this bug, and silently dropping metrics, was this at least resolved?
The costs balloon because of this ingestion limit (I'm paraphrasing):
A single shard can ingest up to 1,000 records per second for writes.
https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html
So if you're trying to write 20,000 metrics a second for example, you'll need 20 shards at a minimum (probably double in practice if you don't want metrics being dropped constantly). At $0.04 per shard hour,
30 days 24 hours $0.04 * 20 shards = $576 per month
Compressing the metrics into a single record I'm able to comfortably get a way with a single shard a month,
30 days 24 hours $0.04 = $28.8 per month
Steps to reproduce:
Use Kinesis normally. The PutRecords API by design may only be partially successful.
It's possible for the request itself to be successul, but in reality not actually process any records.
Expected behavior:
At a minimum communicate that metrics are being dropped. Ideally this would be captured by
telegraf_internal_write
.But I'm going to work towards adding retry support.
Actual behavior:
Metrics get silently dropped.