alerting does not work with latest graphite-raintank/metric-tank

woodsaj commented 8 years ago

pretty much all requests fail due to badStart, badStep errors.

woodsaj commented 8 years ago

I think there are 2 things needed to correct this,

1) this needs to be adjusted to be on the interval boundary and not the exact second. https://github.com/raintank/grafana/blob/master/pkg/alerting/scheduler.go#L74

2) The "start_time" reported in graphite also needs to be adjusted to be the first interval boundary after start. https://github.com/raintank/graphite-raintank/blob/master/graphite_raintank.py#L195

@Dieterbe thoughts?

Dieterbe commented 8 years ago

carbon/whisper quantizes the data (i.e. adjusts a ts of 57 to 60, etc) at storage time. MT/graphite-raintank doesn't IIRC, and i wonder if we should (e.g. to combine data with small ts offsets through graphite's processing functinos), that would probably solve this as well? i'm not quite sure what the time_info and boundary stuff is about, let's do a hangout to discuss.

woodsaj commented 8 years ago

by interval boundary i just mean the quantized value. If the interval is 10seconds, then the interval boundary would be 0,10,20,30,40 or 50.

woodsaj commented 8 years ago

item 2 has been fxed in https://github.com/raintank/graphite-raintank/issues/15

working on item 1 now.

Dieterbe commented 8 years ago

Hold on re 1. See my question about quantizing in MT. Wouldn't that solve that better?

woodsaj commented 8 years ago

it wont make a difference. whether we quantize at ingestion or on read, the data being fed back to grafana Alartering is the same, and grafana currently doesnt like it.

Dieterbe commented 8 years ago

Hmm if grafana alerting does like what graphite+kairos returned, why does it not like what graphite+mt returns? Shouldn't it be the exact same output, esp if we add quantization? (With on read do you mean reading mt into graphite api?)

woodsaj commented 8 years ago

graphite-api uses a finder to get data. This finder just returns a datapoints array of values and a start_time, end_time and step, it does not return the TS for each value. Graphite-api assigns the "start_time" to the first value in the datapoints array, then just adds "step" to get the TS for subsequent values.

when using the graphite-kairos finder, we set the start_time to be the exact TS of the first value in the datapoints array. We were able to do this as there was no aggregation/consolidation happening, so we knew the exact TS. Though this approach had lots of issues when dealing with series that had different step and different state_times.

with graphite-raintank/NMT the exact TS of a datapoint is now fuzzy, as the data being returned may have been consolidated. As of raintank/graphite-raintank#15 graphite-raintank is now using the quantized value based on the step and requested start_time (prior it was just using using requested start_time +1)

So grafana alerting is expecting the the TS of the values returned to match exactly the TS the values were written at. But we can no longer provide that.

woodsaj commented 8 years ago

Submitted PR #541 My testing shows that this resolves the issues with alerting. @Dieterbe any reason you see not to merge?

raintank / grafana

alerting does not work with latest graphite-raintank/metric-tank #540