Closed woodsaj closed 8 years ago
I think there are 2 things needed to correct this,
1) this needs to be adjusted to be on the interval boundary and not the exact second. https://github.com/raintank/grafana/blob/master/pkg/alerting/scheduler.go#L74
2) The "start_time" reported in graphite also needs to be adjusted to be the first interval boundary after start. https://github.com/raintank/graphite-raintank/blob/master/graphite_raintank.py#L195
@Dieterbe thoughts?
carbon/whisper quantizes the data (i.e. adjusts a ts of 57 to 60, etc) at storage time. MT/graphite-raintank doesn't IIRC, and i wonder if we should (e.g. to combine data with small ts offsets through graphite's processing functinos), that would probably solve this as well? i'm not quite sure what the time_info and boundary stuff is about, let's do a hangout to discuss.
by interval boundary i just mean the quantized value. If the interval is 10seconds, then the interval boundary would be 0,10,20,30,40 or 50.
item 2 has been fxed in https://github.com/raintank/graphite-raintank/issues/15
working on item 1 now.
Hold on re 1. See my question about quantizing in MT. Wouldn't that solve that better?
it wont make a difference. whether we quantize at ingestion or on read, the data being fed back to grafana Alartering is the same, and grafana currently doesnt like it.
Hmm if grafana alerting does like what graphite+kairos returned, why does it not like what graphite+mt returns? Shouldn't it be the exact same output, esp if we add quantization? (With on read do you mean reading mt into graphite api?)
graphite-api uses a finder to get data. This finder just returns a datapoints array of values and a start_time, end_time and step, it does not return the TS for each value. Graphite-api assigns the "start_time" to the first value in the datapoints array, then just adds "step" to get the TS for subsequent values.
when using the graphite-kairos finder, we set the start_time to be the exact TS of the first value in the datapoints array. We were able to do this as there was no aggregation/consolidation happening, so we knew the exact TS. Though this approach had lots of issues when dealing with series that had different step and different state_times.
with graphite-raintank/NMT the exact TS of a datapoint is now fuzzy, as the data being returned may have been consolidated. As of raintank/graphite-raintank#15 graphite-raintank is now using the quantized value based on the step and requested start_time (prior it was just using using requested start_time +1)
So grafana alerting is expecting the the TS of the values returned to match exactly the TS the values were written at. But we can no longer provide that.
Submitted PR #541 My testing shows that this resolves the issues with alerting. @Dieterbe any reason you see not to merge?
pretty much all requests fail due to badStart, badStep errors.