raintank / grafana

Grafana - A Graphite & InfluxDB Dashboard and Graph Editor
http://grafana.org
Other
42 stars 4 forks source link

why do a bunch of our graphs have a lower min of 1 instead of 0? #495

Closed Dieterbe closed 8 years ago

Dieterbe commented 9 years ago

i got into a very confusing situation with the devstack. huh

the lower 2 graphs look like they don't have data, but after further investigation, it's just all their points are 0 and the graphs have a left Y min of 1

Dieterbe commented 9 years ago

I think the json key is leftMin and so i suggest updating these values to 0:

dieter@dieter-m6800 litmus ack 'leftMin.*1'
rt-endpoint-summary.json
608:            "leftMin": 1,
683:            "leftMin": 1,
757:            "leftMin": 1,
832:            "leftMin": 1,

rt-collector-summary.json
531:            "leftMin": 1,
597:            "leftMin": 1,
662:            "leftMin": 1,
727:            "leftMin": 1,
Dieterbe commented 9 years ago

@nopzor1200 @torkelo do you know why we have them set to 1 now? seems like 0 is the better thing

torkelo commented 9 years ago

maybe because they are using logarithmic scales that start at 1

nopzor1200 commented 9 years ago

Because if we set it to 0 then we waste a whole tick mark between 0 and 1; for global latency measurements anything less than 1ms is generally an error anyway.

ps.. check out new dashboards search for "OCT20" in Raintank Master Account

nopzor1200 commented 9 years ago

BTW how can you have zero latency? That can't be right? Maybe 1<x<0 I'd believe but zero would generally be an error right?

Dieterbe commented 9 years ago

the collector is reporting 0ms latency (confirmed with nsq_metrics_to_stdout) doesn't seem impossible. i'm monitoring localhost from localhost rounding to ints seems sane in this case (i.e. no 0.5 ms)

for the record, i now see what you mean with wasting a tick mark: tickmark-waste

my scenario is probably not a common scenario, but would be nice to have the data for such a case show up nonetheless instead of being invisible, somehow. perhaps the simplest solution is for the probe to round the measurements up? or maybe if the measurements are consistently > 0.5ms we can just round to nearest (which we probably already do?) @woodsaj what do you think?

nopzor1200 commented 9 years ago

I think for 0 latency that represents a failed check we need to be using NULLs, not setting 0.

Under no valid circumstances should a network litmus probe check be reporting 0.00ms latency. I suspect its just happening because the check is failing and setting 0.00.

(alerting for things like average global or regional latency will get fucked up by zeros)

woodsaj commented 9 years ago

https://github.com/raintank/raintank-metric/issues/36

However, the latest raintank-collector deployment to Production includes a change to just not send null metrics. So there should no longer be 0.0ms latency measurements (unless it actually took 0.0ms)

Dieterbe commented 9 years ago

could we perhaps just not hard-configure an explicit lower Y ? that way the lower Y should be reasonable for the data. but that would mean scales/axis are not consistent over time / across different panels which is annoying too :(

nopzor1200 commented 9 years ago

@woodsaj thats good that we aren't setting 0.0ms (unless 0.0ms happens which I view as an extreme unlikelihood, especially with the public probes).

btw even If we aren't sending null then does graphite still return 'null' to grafana if there's no datapoint?

@Dieterbe thats a good point, def worth further discussion

Dieterbe commented 9 years ago

btw even If we aren't sending null then does graphite still return 'null' to grafana if there's no datapoint?

yes, that's a property of graphite that evolved out of whisper. whisper just has empty slots for those points and returns nulls (in fact, i don't think you can even explicitly send nulls over the graphite protocol). IIRC this property also leaked into the graphite-web/graphite-api code (pretty sure a bunch of the processing function assume evenly spaced points, all nulls explicitly mentioned instead of implicit through absence), that's why when implementing other backends (influxdb, tank, ...) that only store real points, we should always fill in the nulls.

woodsaj commented 9 years ago

@nopzor1200 yes, graphite always fills in gaps with Nulls.

nopzor1200 commented 8 years ago

I think this can be closed.