Open redbaron opened 4 months ago
Hi @redbaron,
Today we have ~three~ two time-based config options period and delay:
## Requested CloudWatch aggregation Period (required)
## Must be a multiple of 60s.
period = "5m"
## Collection Delay (required)
## Must account for metrics availability via CloudWatch API
delay = "5m"
Correct me if I am wrong, but your proposal would change how additional windows are calculated. Rather than basing them off a previous interval's last end, you want always go from the start-period like we do for the first interval?
I'm looking at the code for updateWindow
.
my proposal is to always have windowEnd = $now
. I don't fully understand how windowStart is calculated there, but it should be roughly $now - delay
at every gather.
Ah ok thanks for clarifying. I assume we could have a window_end_mode
option, where we have your new option "now" and a default option with "delay"?
Something you would be willing to put up a PR for?
Something you would be willing to put up a PR for?
Hi, unfortunately I wont be able to dedicate time for it at the moment.
Use Case
Cloudwatch can be very instable in delivering metrics in time: delays can be from minutes to half an hour in some cases. If one wants to avoid gaps in collected metrics, then
delay
param has to be set sufficiently back in time to cover possible delays. It is then calculates delayed timestamp as$delay = $now - delay
.Telegraf then queries metrics in range
$delay:$delay+period
everyinterval
time. It makes metrics always delayed even if Cloudwatch have fresh data. In other wordsdelay
is set to cover worst case, but it penalizes best case by doing so.Expected behavior
It would be good if cloudwatch plugin could be configured to fetch metrics in
$delay:$now
interval, this will allow $delay to be set sufficiently back in time to cover occasional late metrics delivery, yet have freshest possibly data if Cloudwatch has it.Obviosuly same Cloudwatch data point will be fetched multiple times, which can incur costs , but it is a tradeoff telegraf users might be willing to take.
Actual behavior
telegraf cueries single datapoint in
$delay:$delay+period
range, thus missing fresher data even if it exists.Additional info
No response