grafana / grafana

The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
https://grafana.com
GNU Affero General Public License v3.0
63.79k stars 11.94k forks source link

Expressions referencing Cloudwatch Query returns NoData #70139

Closed ChevronTango closed 3 weeks ago

ChevronTango commented 1 year ago

What went wrong?

What happened:

I would like to use expressions to resample some of my cloudwatch billing data. At present though, the expressions seem to have inconsistent behaviour when referencing a Cloudwatch query that is code based. The ones using the builder seem to be fine, but the code ones all force the graph to return NoData.

Looking through the grafana logs I can see

level=warn msg="Can't find response by refID. Return nodata" responseRefIds=[]

I'm not sure how that's possible, as https://github.com/grafana/grafana/blob/main/pkg/expr/nodes.go#L256 doesn't look like it can produce that error and also have an empty array for the responseRefIds.

What did you expect to happen:

The expression to either evaluate correctly, allowing me to resample the cloudwatch data, or to produce an error I can easily debug.

How do we reproduce it?

Step 1:

Step 2:

Step 3:

What Grafana version are you using?

9.5.3

Optional Questions:

Is the bug inside a Dashboard Panel?

Key Value
Panel graph @ 10.0.0
Grafana 10.0.0 (81d85ce802) // Enterprise
Panel debug snapshot dashboard ```json { "panels": [ { "datasource": { "type": "grafana", "uid": "grafana" }, "aliasColors": {}, "dashLength": 10, "editable": true, "fieldConfig": { "defaults": { "links": [] }, "overrides": [] }, "fill": 1, "grid": {}, "gridPos": { "h": 13, "w": 15, "x": 0, "y": 0 }, "id": 2, "legend": { "alignAsTable": true, "avg": true, "current": true, "hideEmpty": false, "hideZero": false, "max": true, "min": true, "show": true, "sort": "current", "sortDesc": true, "total": false, "values": true }, "lines": true, "linewidth": 2, "links": [], "nullPointMode": "connected", "options": { "alertThreshold": true }, "pluginVersion": "10.0.0", "pointradius": 5, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "targets": [ { "refId": "A", "datasource": { "type": "grafana", "uid": "grafana" }, "queryType": "snapshot", "snapshot": [] } ], "thresholds": [], "timeRegions": [], "title": "Reproduced with embedded data", "tooltip": { "msResolution": false, "shared": true, "sort": 2, "value_type": "cumulative" }, "type": "graph", "xaxis": { "mode": "time", "show": true, "values": [], "name": null, "buckets": null }, "yaxes": [ { "$$hashKey": "object:75", "format": "currencyUSD", "logBase": 1, "min": 0, "show": true }, { "$$hashKey": "object:76", "format": "short", "logBase": 1, "show": false } ], "yaxis": { "align": false }, "bars": false, "dashes": false, "error": false, "fillGradient": 0, "hiddenSeries": false, "percentage": false, "points": false, "stack": false, "steppedLine": false, "timeFrom": null, "timeShift": null }, { "gridPos": { "h": 7, "w": 9, "x": 15, "y": 0 }, "id": 5, "options": { "content": "\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Panelgraph @ 10.0.0
QueriesA[undefined], B[undefined], C[undefined], Not-Working[__expr__], Working[__expr__]
DataError 0 frames, 0 fields, 0 rows
Grafana10.0.0 (81d85ce802) // Enterprise
", "mode": "html" }, "title": "Debug info", "type": "text" }, { "id": 6, "title": "Original Panel JSON", "type": "text", "gridPos": { "h": 13, "w": 9, "x": 15, "y": 7 }, "options": { "content": "{\n \"datasource\": {\n \"uid\": \"$datasource\",\n \"type\": \"cloudwatch\"\n },\n \"aliasColors\": {},\n \"dashLength\": 10,\n \"editable\": true,\n \"fieldConfig\": {\n \"defaults\": {\n \"links\": []\n },\n \"overrides\": []\n },\n \"fill\": 1,\n \"grid\": {},\n \"gridPos\": {\n \"h\": 10,\n \"w\": 24,\n \"x\": 0,\n \"y\": 15\n },\n \"id\": 3,\n \"legend\": {\n \"alignAsTable\": true,\n \"avg\": true,\n \"current\": true,\n \"hideEmpty\": false,\n \"hideZero\": false,\n \"max\": true,\n \"min\": true,\n \"show\": true,\n \"sort\": \"current\",\n \"sortDesc\": true,\n \"total\": false,\n \"values\": true\n },\n \"lines\": true,\n \"linewidth\": 2,\n \"links\": [],\n \"nullPointMode\": \"connected\",\n \"options\": {\n \"alertThreshold\": true\n },\n \"pluginVersion\": \"10.0.0\",\n \"pointradius\": 5,\n \"renderer\": \"flot\",\n \"seriesOverrides\": [],\n \"spaceLength\": 10,\n \"targets\": [\n {\n \"datasource\": {\n \"uid\": \"$datasource\"\n },\n \"alias\": \"\",\n \"application\": {\n \"filter\": \"\"\n },\n \"dimensions\": {\n \"Currency\": \"USD\"\n },\n \"expression\": \"\",\n \"functions\": [],\n \"group\": {\n \"filter\": \"\"\n },\n \"hide\": true,\n \"highResolution\": false,\n \"host\": {\n \"filter\": \"\"\n },\n \"id\": \"m1\",\n \"item\": {\n \"filter\": \"\"\n },\n \"matchExact\": true,\n \"metricEditorMode\": 0,\n \"metricName\": \"EstimatedCharges\",\n \"metricQueryType\": 0,\n \"mode\": 0,\n \"namespace\": \"AWS/Billing\",\n \"options\": {\n \"showDisabledItems\": false\n },\n \"period\": \"86400\",\n \"refId\": \"A\",\n \"region\": \"us-east-1\",\n \"returnData\": false,\n \"statistic\": \"Average\",\n \"label\": \"\"\n },\n {\n \"datasource\": {\n \"uid\": \"$datasource\"\n },\n \"alias\": \"\",\n \"application\": {\n \"filter\": \"\"\n },\n \"dimensions\": {\n \"Currency\": \"USD\"\n },\n \"expression\": \"RATE(m1) * PERIOD(m1)\",\n \"functions\": [],\n \"group\": {\n \"filter\": \"\"\n },\n \"hide\": true,\n \"highResolution\": false,\n \"host\": {\n \"filter\": \"\"\n },\n \"id\": \"m2\",\n \"item\": {\n \"filter\": \"\"\n },\n \"matchExact\": true,\n \"metricEditorMode\": 1,\n \"metricName\": \"EstimatedCharges\",\n \"metricQueryType\": 0,\n \"mode\": 0,\n \"namespace\": \"AWS/Billing\",\n \"options\": {\n \"showDisabledItems\": false\n },\n \"period\": \"86400\",\n \"refId\": \"B\",\n \"region\": \"us-east-1\",\n \"returnData\": false,\n \"statistic\": \"Average\",\n \"label\": \"\"\n },\n {\n \"datasource\": {\n \"uid\": \"$datasource\"\n },\n \"alias\": \"Total estimated daily charge\",\n \"application\": {\n \"filter\": \"\"\n },\n \"dimensions\": {\n \"Currency\": \"USD\"\n },\n \"expression\": \"IF(m2>0, m2)\",\n \"functions\": [],\n \"group\": {\n \"filter\": \"\"\n },\n \"hide\": false,\n \"highResolution\": false,\n \"host\": {\n \"filter\": \"\"\n },\n \"id\": \"m3\",\n \"item\": {\n \"filter\": \"\"\n },\n \"matchExact\": true,\n \"metricEditorMode\": 1,\n \"metricName\": \"EstimatedCharges\",\n \"metricQueryType\": 0,\n \"mode\": 0,\n \"namespace\": \"AWS/Billing\",\n \"options\": {\n \"showDisabledItems\": false\n },\n \"period\": \"86400\",\n \"refId\": \"C\",\n \"region\": \"us-east-1\",\n \"returnData\": false,\n \"statistic\": \"Average\",\n \"label\": \"Total estimated daily charge\"\n },\n {\n \"datasource\": {\n \"type\": \"__expr__\",\n \"uid\": \"__expr__\",\n \"name\": \"Expression\"\n },\n \"refId\": \"Not-Working\",\n \"type\": \"math\",\n \"hide\": false,\n \"expression\": \"$m3\"\n },\n {\n \"refId\": \"Working\",\n \"datasource\": {\n \"type\": \"__expr__\",\n \"uid\": \"__expr__\",\n \"name\": \"Expression\"\n },\n \"type\": \"math\",\n \"hide\": false,\n \"expression\": \"$m1\"\n }\n ],\n \"thresholds\": [],\n \"timeRegions\": [],\n \"title\": \"Estimated daily charges\",\n \"tooltip\": {\n \"msResolution\": false,\n \"shared\": true,\n \"sort\": 2,\n \"value_type\": \"cumulative\"\n },\n \"type\": \"graph\",\n \"xaxis\": {\n \"mode\": \"time\",\n \"show\": true,\n \"values\": [],\n \"name\": null,\n \"buckets\": null\n },\n \"yaxes\": [\n {\n \"$$hashKey\": \"object:75\",\n \"format\": \"currencyUSD\",\n \"logBase\": 1,\n \"min\": 0,\n \"show\": true\n },\n {\n \"$$hashKey\": \"object:76\",\n \"format\": \"short\",\n \"logBase\": 1,\n \"show\": false\n }\n ],\n \"yaxis\": {\n \"align\": false\n },\n \"bars\": false,\n \"dashes\": false,\n \"error\": false,\n \"fillGradient\": 0,\n \"hiddenSeries\": false,\n \"percentage\": false,\n \"points\": false,\n \"stack\": false,\n \"steppedLine\": false,\n \"timeFrom\": null,\n \"timeShift\": null\n}", "mode": "code", "code": { "language": "json", "showLineNumbers": true, "showMiniMap": true } } }, { "id": 3, "title": "Data from panel above", "type": "table", "datasource": { "type": "datasource", "uid": "-- Dashboard --" }, "gridPos": { "h": 7, "w": 15, "x": 0, "y": 13 }, "options": { "showTypeIcons": true }, "targets": [ { "datasource": { "type": "datasource", "uid": "-- Dashboard --" }, "panelId": 2, "withTransforms": true, "refId": "A" } ] } ], "schemaVersion": 37, "title": "Debug: Estimated daily charges // 2023-06-15 11:13:42", "tags": [ "debug", "debug-graph" ], "time": { "from": "2023-05-16T10:13:42.138Z", "to": "2023-06-15T10:13:42.138Z" } } ```

Grafana Platform?

Kubernetes

User's OS?

No response

User's Browser?

No response

Is this a Regression?

No

Are Datasources involved?

Cloudwatch

Anything else to add?

Possibly related to https://github.com/grafana/grafana/issues/66647 though that seemed to only be for alerting, not a panel

zuchka commented 1 year ago

you should also check out using Grafana's built-in transformations, which can do a lot of powerful sql-like transforms. Just remember that all those get done on the frontend side (expressions are server-side) so they are not nearly as performant as handling transforms inside the originating DB.

https://grafana.com/docs/grafana/latest/panels/transform-data/

ChevronTango commented 1 year ago

you should also check out using Grafana's built-in transformations, which can do a lot of powerful sql-like transforms. Just remember that all those get done on the frontend side (expressions are server-side) so they are not nearly as performant as handling transforms inside the originating DB.

https://grafana.com/docs/grafana/latest/panels/transform-data/

Thanks @zuchka. I did look at that, however the transformation I need is the resample, which doesn't exist outside of expressions.

sarahzinger commented 1 year ago

Hey @ChevronTango I was able to reproduce your bug when I imported that dashboard. So I took a look at that dashboard and it seems like it was developed for Grafana 7.4.1 https://github.com/monitoringartist/grafana-aws-cloudwatch-dashboards/blob/master/aws-billing/aws-billing.json#L14 which was quite a while ago.

So tried recreating the same queries myself and it loaded successfully! I wonder if perhaps we have some sort of migration but from 7.4.1 -> 10.0.0 Would you mind trying again making the same queries from scratch in a new dashboard and seeing if that fixes your issue? If so I think we can confirm that the issues is a migration issue between versions.

ChevronTango commented 1 year ago

Hey @ChevronTango I was able to reproduce your bug when I imported that dashboard. So I took a look at that dashboard and it seems So tried recreating the same queries myself and it loaded successfully! I wonder if perhaps we have some sort of migration but from 7.4.1 -> 10.0.0 Would you mind trying again making the same queries from scratch in a new dashboard and seeing if that fixes your issue? If so I think we can confirm that the issues is a migration issue between versions.

Thanks @sarahzinger. I just did a fresh deploy of grafana to our account using the helm chart 6.57.4 which deploys version v9.5.5 and I still get the same issue. I am able to reference $A in an expression, but not $C. It just causes the whole query to return nothing still. Maybe its fixed in v10 but I'm skeptical. A brand new dashboard with the queries entered fresh still has the same blank result.

Without any error messages it's difficult to debug or else I'd be more specific. Even if the expression returns a result (ie. From querying $A) the other 2 queries stop returning any data. I definitely don't think it should be possible for the expression to cause the other queries to also return nothing. At the very least I still expected to see the values of $C in my graph, even if the expression failed or was referencing something unrelated.

ChevronTango commented 1 year ago

Can you confirm @sarahzinger that you are able to do a resample of the output of $C using v10? If that is the case, Do you also know when we are likely to see v10 released via helm?

sarahzinger commented 1 year ago

@ChevronTango apologies! I took another look today and I am able to recreate this even in 10.0.0, just must have missed it before. I think I may have missed it because I wasn't hiding queries, A, B, and C, so I was seeing data come up for A and mistakenly thinking I couldn't reproduce. Either that or we have some kind of odd race condition where it worked yesterday on my machine and today it doesn't haha.

Interestingly adding any expression (not just resample, for example a Math express of $C+10 also reproduces the bug). I wonder if we have made some kind of odd assumption in our code somewhere that if we add an expression we only need to return the first query results or something else similarly odd.

We will try to get this prioritized by the team, thank you so much for bringing this to our attention and for all of your help in repro-ing this!

Regarding helm, my understanding is that Grafana does not maintain helm, but it should be possible to change the appVersion to 10.0.1 in values.yaml if you'd like. Although as I said I can repro this issue in 10.0.0 so I'm afraid upgrading won't help this particular issue.

iwysiu commented 1 year ago

Moved to waiting as it may be fixed in https://github.com/grafana/grafana/pull/72935

idastambuk commented 1 year ago

Hi again @ChevronTango, the fix for this was just merged and will be released in Grafana 10.2.0. Keep in mind that you will have to enable the 'sseGroupByDatasource' feature toggle for the time being, in order to use CW metric math alongside Expressions like in the panel.

Let us know if that worked for you so we can close the ticket!

ChevronTango commented 10 months ago

I can confirm that with the sseGroupByDatasource feature flag enabled, and running on 10.2.0, this bug is now resolved. Thank you @idastambuk @sarahzinger and @iwysiu for your help in resolving this. I look forward to this being a general release feature in the future.

idastambuk commented 10 months ago

Hi @ChevronTango thanks for letting us know, closing the ticket!

samjewell commented 3 weeks ago

Hi @ChevronTango Are you actively using this feature? And if so, can you explain whether it is of particular importance to you - what the impact is of having this feature? We've disabled it by default, as we never got it to play nicely with other datasources as far as I know. And we're not aware of anyone else using it. I don't know if you are a software-engineer/developer yourself, but at this stage we have a feature branch in the code which is barely used, and which hinders our ability to reason about and maintain this code. As a result, we're considering whether we have an option to remove that code (and with it the feature you are or were using), in order to accelerate our velocity in this part of the codebase, and also reduce the tendency for this code to attract bugs.

Looking forward to your thoughts on this, thanks 🙏