grafana / grafana

The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
https://grafana.com
GNU Affero General Public License v3.0
65.11k stars 12.15k forks source link

feature request: add consolidateBy() selected by user if maxDataPoints used. #914

Closed toni-moreno closed 8 years ago

toni-moreno commented 10 years ago

I found that grafana uses maxDataPoints ( a great workarround to speed up plotting with time series with thousands of datapoints) .

But dashboard designer should be able to select how to consolidate each metric if finally consolidations is done. ( by default averaged consolidations done)

I suggest to add a combo selector for each metric ( the default should be 'average')

image

if selected other than default grafana could add consolidateby("metric","selected_by_user") function by itself.

what do you think about?

torkelo commented 10 years ago

not sure, I mean you can just add the consolidateBy function manually today with the function selector. But maybe this needs to be exposed in a simpler way. I know users who are confused with the grafana calculated total for counters which can be inaccurate due to the default consolidation function is avg.

torkelo commented 10 years ago

I think the hardest thing would be how to make this simple and not complicate the graphite query editor to much. The simplest thing would be to add a "consolidate" submenu to the cog menu (that only has a duplicate option right now) where you can select consolidate function, it would add/update the function (so would be visible as a normal function after the metric expression.

toni-moreno commented 10 years ago

I understand your point of view but , IMHO advanced users should have "consolidateBy" in the function selector but also a "explicit" consolidation option in the graph editor should be "automatically" added when making the metric request if we are using "maxDataPoint"

This "explicit" options remember advanced dashboard's designers that sometimes what we see are not exactly the same data than original , and we must be careful on this.

IMHO Is mandatory for this kind of software to be careful with that ( I'm member of a specialized systems performance engineering group and is something we take care on each dashboard we design -- now we use a dashboard client we have made ourselves that we would like to migrate to grafana in the future--)

I have two suggestions:

A) I think that by adding a small combo at the middle of the "letter" and the "eye" on each metric row won't complicate much more than is now, and the size of the combo will be quite small (only 3 letters, max,min,avg,sum).

B) You can add a new tab that will have to show the same metric list than the metrics tab without editing metric capability option and a combo at the right where "graphana_target_controls" are now placed.

image

in the same way you don't show the maxDataPoints nowhere. I think is better not to add "explicity" the consolidateBy() in the metric editor only on request time ( only if different than avg) . Of course that grafana documentation an also in grafana "tips" should notify users about this behaviour.
, A consolidateBy() wrapping every metric won't add performance penalty, And in most cases the consolidation option will be "avg" so you won't add anything to the metric.

toni-moreno commented 10 years ago

If you choose any of the 2 options and you guide me on the files related to change and any other hint you consider , I can try to build a new PR with this new feature.

torkelo commented 10 years ago

of the option A or B I prefer A, but I am not sure if it should be next the the query letter or the in the cog menu. You are welcome to try and implement it, but no guarantee I will merge it. But I think you are right that how maxDataPoints is implicitly added and it AVG consolidation logic hidden some graphs can be misleading (especially the Total legend value). Will definitely add a point consolidation help box bellow the queries (where I have help boxes for templating, stacking, shorted legend names etc)

toni-moreno commented 10 years ago

Hi @torkelo

I've finally done a mix between A and what you suggest ( in the cog menu).

image.

I think is really explicit, easy to understand by any user not weird or ugly and easy to understand.

I've also added a tip letting user get fast infor about this new control.

image.

After in the graphiteDatasource.js I've added and hidden to the user the consolidateBy() function:

    if(target.consolidationFunc && target.consolidationFunc !== 'avg') {
          targetValue= 'aliasSub(consolidateBy(' + targetValue +
                       ',"' + target.consolidationFunc +'"),"consolidateBy\\((.*),.'+
                       target.consolidationFunc+'.\\)$","\\1")';
        }

what do you think about this?

toni-moreno commented 10 years ago

I have the PR ready only one important thing to fix before.

consolidateBy() appears in graphite 0.9.11 version, so in anyway I should query graphite version to enable or not this new control.

Have you queried and stored the graphite version in any local variable ?

torkelo commented 10 years ago

I would not worry so much about 0.9.11 requirement, grafana and graphite older than 0.9.12 + maxDataPoints parameter is close to useless (only works for short time periods), However I am concerned by the horizontal space requirements for your solution, many queries would require two rows if so much space is taken for the consolidation selection

toni-moreno commented 10 years ago

2 things:

a) About versions and maxDataPoints

I've working with our dashboard front-end with graphite versions 0.9.9 and 0.9.10 without maxDataPoints by adding to every target a summarize() with time resolution depending on the complete window time range requested, before doing the request.

By example: if we request last 1 hour we'll summarize with 1 minute resolution. summarize('metric','1minute',"consolidationFunc") if last 6 hours we'll summarize '5 minute'. if last 24 hours we'll summarize by '10minute'.

is really easy to code and is working fine.

this data reduction method has also other benefits, when a time serie has a lot of variations over the time and you request a large window time , one year by example, without summarization you only will see a lot of lines over the graph but not general behaviour over the time, with summarization is easy to see the general behaviour over time , if you need more resolution you should only choose a smaller window time.

This method is also used in proprietary software like CA APM Introscope.

b) About selector solution.

I've removed "On consolidation do: " and added "C[ ] " like a function

better now?

image

toni-moreno commented 10 years ago

Hi @torkelo , while waiting you can merge #931, I would like to show you how this reduction data method by previously consolidation seems a de facto standard in the annalist and graphing data tools, so I took a screenshot on the time selectors for the CA Introscope software.

image

As you can see you have 2 selectors , one for window time ( Range) another for interval of consolidation/summarization which is indeed de resolution.

There are a mapping between each windows time and each resolution time that changes when you change range but this software enables you to force another resolution ( sometimes it implies more processing time at server side and also at client side).

graphing data by first selecting its resolution seems a de facto standard in the time series world , ( and indeed I love this feature in CA Introscope) . And is exactly what is requesting @FrankyGT in this issue https://github.com/grafana/grafana/issues/904.

All timeseries databases has capability to query data by a resolution.

 summarize("serie","resolution_time", "consolidation_mode").
select percentile(value, 95) from response_times group by time(1m)
m=avg:10m-avg:rate:proc.stat.cpu{type=idle} 

Downsample the datapoints using the 10-minute average, and show the average number of idle CPU cycles across the board.

If you consider implement it on grafana (something I hope) I would like to help you.

torkelo commented 10 years ago

Yes, why not use the summarize function? I mean what you are talking about already exists in Grafana, not sure what you are suggesting.

The summarize function and a interval variable will get you exactly what you want

torkelo commented 10 years ago

As you said it already exist for graphite and InfluxDB, only Opentsdb that is left out

toni-moreno commented 10 years ago

Sorry my English is so poor

I haven't explained myself as good as I would like.

I'm not a OpenTSDB expert, but I've investigated a little about it:

http://opentsdb.net/docs/build/html/user_guide/query/index.html

And in my previous note , I have informed you that also OpenTSDB has this functionality but with another name "downsampling". the previous example shows how you can query for averaged data with 10 minutes of resolution ( m=avg:10m-avg:rate:proc.stat.cpu{type=idle} ).

What I suggest Is perhaps grafana in future releases could enable (if dashboard designer wants ) a second selector ( like CA Introscope does) with time resolution and grafana by itself should add to the metrics , previously to the request, the sumarize() method in the case of graphite group by() clause in the case of influxdb and :Xm-function: option in the case of OpenTSDB.

If done you will support also graphite versions previous to 0.9.12

What do you think about that ?

torkelo commented 10 years ago

Would be hard to add summarize function automatically and get the aggregation/rollup function selection right. Would need to be done manually, and only makes sense in some specific dashboards. I think the option to create a time interval variable that you can use in summarize function and then quickly change (or set to auto) to be a great option for this.

I am also not sure about how OpenTSDB works.

FrankyGT commented 10 years ago

A dynamic time interval variable that changes when the time range changes should be sufficient I guess, if it is usable as a downsample parameter. ie : 1m, 10m, 30m

torkelo commented 10 years ago

@FrankyGT that already exists

toni-moreno commented 10 years ago

IMHO I think it is possible and highly recommended to add some automation features to grafana to simplify dashboard design

I have designed a lot of dashboards gathering metrics from graphite and we always visualize a subset of them by summarizing data ,right now we can choose intervals with a range/resolution selector (as discussed before).

Now we have plans to migrate our dashboards to grafana, and we can really do the same we need to do 2 steps.

1.- use a "interval" variable. 2.- wrap by hand all our metrics with summarize()., a bored and heavy work that dashboard engine could do for me. ( the only difference on each metric is the consolidation function based on the nature of the metric )

For the dashboard designer this is a common behavior for all graphs for all dashboards, and could be done by the dashboard engine with independence of the source time series database.

I would like to add this feature (, first with graphite datasources,).

To add it we only need:

phase 1

1.-Enable an easy way to define the consolidation function to each metric ( as in https://github.com/grafana/grafana/pull/931)

image

2.-Add in the dashboadd timepicker settings a new resolution selector ( only for graphite by now)

image

3.- If enabled resolution selector I will add on each metric the summarize() function in a transparent way to the user and dashboard designer.

This option of course could be enabled/disabled.

phase 2

For influxdb and openTSDB , the only we need is add in the editor.html of each datasource , the consolidation funcions they have, and add in the influxDatasource.js and openTSDBDataource.js some logic to add the consolidation funcion.

but first of all could be good to merge https://github.com/grafana/grafana/issues/914, ( this is 33% of the work ). for the phase 1

torkelo commented 10 years ago

I agree that adding automation and simplifying dashboard design and graph composition is what grafana is all about. But has to be handled carefully in order not complicate things or have it general enough so that most users benefit instead of only being useful in some cases and ads complexity in the majority of cases. So I agree, but I also think that features like this needs to be handled with care and not added "on a whim". I need to be sure that this would be useful for many users and try to limit complications and complexity in the UI usability.

So a very broad general feature like a dashboard global resolution/groupby/summarize by selection that would modify all queries without causing unwanted behavior (some queries/graphs might need to be excepted or break) sounds tricky at this point and would require a lot of work to make sure it works properly. I think using a interval variable gives more flexibility and explicit behavior. Modifying all queriers by wrapping consolidateBy and summarize and aliasSub function sounds like a recipie for strange bugs and unhappy users not understanding what is going on behind the seams.

torkelo commented 10 years ago

So at this point I would prefer maybe a built in resolution variable, make it easier to change/add consolidateBy. You would still need to add summarize function to the panels/queries you want but you would not need to enable templating and create an interval variable.

torkelo commented 8 years ago

closing this for lack of interest