Closed nickstenning closed 7 years ago
Yes, my assumption (upon reading your tweet) is that this is due to the condensing of data necessary to fit a fixed number of points within the rendered image, and not "exactly the problem" that you alluded to when dealing with percentiles.
My suggestion would be to either widen your graph to a point where consolidation is unnecessary or use the consolidateBy function with the max
or min
arguments to preserve your peaks.
If you're curious in the consolidation code, you can find the respective bits in HEAD here.
Thank you for the link to consolidateBy
-- that looks like exactly what I was looking for.
And btw, sorry if my tweets annoyed you. I definitely wasn't trying to point the finger here -- I was just confused by what graphite's solution was to a real and general problem with plotting lots of points efficiently, and I wasn't able to find consolidateBy
in the documentation. FWIW I'm pretty sure it is exactly the problem discussed by Heinrich in the post linked -- the percentile aggregation in Circonus seems to solve the same problem as consolidateBy
.
Oh I wasn't annoyed so much, I just get frustrated with conjecture... especially with a lack of details. Anyways, I'm glad that we narrowed down the problem and gave you a suitable fix. 👍
I have plans to move carbonapi
over to using https://github.com/dgryski/go-lttb as the default aggregation method.
note that consolidateBy only applies to runtime consolidation. If you're loading data from historical archives that are aggregated, you depend on what the store gives you. Whisper in particular only supports one consolidation function at a time, on a per-series basis, which may not be what you're asking for with consolidateBy. In fact you may be getting data that is off when you combine two different aggregation functions (one set by whisper, one via consolidateBy at runtime) See http://dieter.plaetinck.be/post/25-graphite-grafana-statsd-gotchas/#runtime.consolidation
@Dieterbe As @nickstenning made clear early on, he was experiencing datapoint consolidation within the same archive, not rollups.
Hi there, I'm trying to work out what my options are for dealing with what seems to me to be a spike erosion (AKA peak erasure) issue with the default Graphite renderer. Here are two plots displayed on two different timescales. First, the past hour:
.../render?target=aliasByNode(stats.app-1.timers.gunicorn.request.duration.upper, 1, -1)&from=-1h
and second, the past six hours:
.../render?target=aliasByNode(stats.app-1.timers.gunicorn.request.duration.upper, 1, -1)&from=-6h
I've taken the liberty of annotating the plots to show the issue I'm struggling with. Namely, that the spike visible in the 1h plot at about 10:00, with a value of ~570, is not visible on the 6h plot at all. Indeed, the scale of the 6h plot does not reflect the max/min values of the data actually stored by Graphite.
I've confirmed that this isn't an aggregation issue. Switching the renderer to
type=json
, I can find the spike in both thefrom=-1h
andfrom=-6h
outputs:For the sake of completeness, however, here are the relevant extracts from
storage-schemas.conf
andstorage-aggregation.conf
storage-schemas.conf
storage-aggregation.conf
I'm guessing (and it is just a guess) that this is a side-effect of sampling (and possibly averaging) done by the renderer when the number of datapoints for a time range is too high to fit in a plot of a given size. This hypothesis seems to be supported by the observation that providing larger
width
parameters results in a different vertical scale. The plot displays more variance within the data as the width increases.I don't think the default behaviour is inappropriate in general, but for certain metrics (such as those representing maximum or minimum) values, it would be nice if there was a way to ensure that the sampling didn't "erase" peaks in the underlying data.