Closed Dieterbe closed 8 years ago
I think the json key is leftMin and so i suggest updating these values to 0:
dieter@dieter-m6800 litmus ack 'leftMin.*1'
rt-endpoint-summary.json
608: "leftMin": 1,
683: "leftMin": 1,
757: "leftMin": 1,
832: "leftMin": 1,
rt-collector-summary.json
531: "leftMin": 1,
597: "leftMin": 1,
662: "leftMin": 1,
727: "leftMin": 1,
@nopzor1200 @torkelo do you know why we have them set to 1 now? seems like 0 is the better thing
maybe because they are using logarithmic scales that start at 1
Because if we set it to 0 then we waste a whole tick mark between 0 and 1; for global latency measurements anything less than 1ms is generally an error anyway.
ps.. check out new dashboards search for "OCT20" in Raintank Master Account
BTW how can you have zero latency? That can't be right? Maybe 1<x<0 I'd believe but zero would generally be an error right?
the collector is reporting 0ms latency (confirmed with nsq_metrics_to_stdout) doesn't seem impossible. i'm monitoring localhost from localhost rounding to ints seems sane in this case (i.e. no 0.5 ms)
for the record, i now see what you mean with wasting a tick mark:
my scenario is probably not a common scenario, but would be nice to have the data for such a case show up nonetheless instead of being invisible, somehow. perhaps the simplest solution is for the probe to round the measurements up? or maybe if the measurements are consistently > 0.5ms we can just round to nearest (which we probably already do?) @woodsaj what do you think?
I think for 0 latency that represents a failed check we need to be using NULLs, not setting 0.
Under no valid circumstances should a network litmus probe check be reporting 0.00ms latency. I suspect its just happening because the check is failing and setting 0.00.
(alerting for things like average global or regional latency will get fucked up by zeros)
https://github.com/raintank/raintank-metric/issues/36
However, the latest raintank-collector deployment to Production includes a change to just not send null metrics. So there should no longer be 0.0ms latency measurements (unless it actually took 0.0ms)
could we perhaps just not hard-configure an explicit lower Y ? that way the lower Y should be reasonable for the data. but that would mean scales/axis are not consistent over time / across different panels which is annoying too :(
@woodsaj thats good that we aren't setting 0.0ms (unless 0.0ms happens which I view as an extreme unlikelihood, especially with the public probes).
btw even If we aren't sending null then does graphite still return 'null' to grafana if there's no datapoint?
@Dieterbe thats a good point, def worth further discussion
btw even If we aren't sending null then does graphite still return 'null' to grafana if there's no datapoint?
yes, that's a property of graphite that evolved out of whisper. whisper just has empty slots for those points and returns nulls (in fact, i don't think you can even explicitly send nulls over the graphite protocol). IIRC this property also leaked into the graphite-web/graphite-api code (pretty sure a bunch of the processing function assume evenly spaced points, all nulls explicitly mentioned instead of implicit through absence), that's why when implementing other backends (influxdb, tank, ...) that only store real points, we should always fill in the nulls.
@nopzor1200 yes, graphite always fills in gaps with Nulls.
I think this can be closed.
i got into a very confusing situation with the devstack.
the lower 2 graphs look like they don't have data, but after further investigation, it's just all their points are 0 and the graphs have a left Y min of 1