grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.9k stars 3.45k forks source link

avg_over_time might trigger `aggregation operator '"sum"' without grouping` error at specified intervals #13287

Open ghost opened 4 months ago

ghost commented 4 months ago

Describe the bug Running avg_over_time query fails when run against certain intervals of time Other functions _over_time seem to behave properly without failing. The job does not need to exist, no data is needed to reproduce the error. The problem is independent of the producer, collector or log format.

To Reproduce Steps to reproduce the behavior:

  1. Run avg_over_time({job="nojob/nojob"} | json | keep http_route, response_time | unwrap response_time [$__auto])
  2. Alternative query from @slim-bean that exhibits the same behaviour avg by (http_route) (avg_over_time({job="nojob/nojob"} |json | keep http_route, response_time | unwrap response_time [$__auto]))
  3. Use Grafana UI and set a range of time last 5 min, get empty (or values)
    {
    "queries": [
    {
      "refId": "A",
      "expr": "avg_over_time({job=\"pi-logs-custom3/pi-logs-custom3\"} |json | keep http_route, response_time | unwrap response_time [$__auto])",
      "queryType": "range",
      "datasource": {
        "type": "loki",
        "uid": "grafanacloud-logs"
      },
      "editorMode": "code",
      "legendFormat": "",
      "datasourceId": 7,
      "intervalMs": 200,
      "maxDataPoints": 1343
    }
    ],
    "from": "1719009561800",
    "to": "1719009861802"
    }
  4. Use Grafana UI and set a range of time last 6 hours, get 500 error aggregation operator '"sum"' without grouping Payload:
    {
    "queries": [
    {
      "refId": "A",
      "expr": "avg_over_time({job=\"pi-logs-custom3/pi-logs-custom3\"} |json | keep http_route, response_time | unwrap response_time [$__auto])",
      "queryType": "range",
      "datasource": {
        "type": "loki",
        "uid": "grafanacloud-logs"
      },
      "editorMode": "code",
      "legendFormat": "",
      "datasourceId": 7,
      "intervalMs": 15000,
      "maxDataPoints": 1343
    }
    ],
    "from": "1718988150000",
    "to": "1719009750941"
    }
    {
    "results": {
        "A": {
            "error": "aggregation operator '\"sum\"' without grouping",
            "errorSource": "downstream",
            "status": 500
        }
    }
    }

Expected behavior I expect values being returned, or empty if no data is found

Environment:

Screenshots, Promtail config, or terminal output n/a

ghost commented 4 months ago

Related https://github.com/grafana/loki/pull/12176

Kamilcuk commented 2 months ago

Hi. I hit the same error. As for now, just adding avg_over_time(....) by (this, list, of, labels) solved the problem for me.