grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
22.73k stars 3.31k forks source link

avg_over_time might trigger `aggregation operator '"sum"' without grouping` error at specified intervals #13287

Open nickgrafana opened 1 week ago

nickgrafana commented 1 week ago

Describe the bug Running avg_over_time query fails when run against certain intervals of time Other functions _over_time seem to behave properly without failing. The job does not need to exist, no data is needed to reproduce the error. The problem is independent of the producer, collector or log format.

To Reproduce Steps to reproduce the behavior:

  1. Run avg_over_time({job="nojob/nojob"} | json | keep http_route, response_time | unwrap response_time [$__auto])
  2. Alternative query from @slim-bean that exhibits the same behaviour avg by (http_route) (avg_over_time({job="nojob/nojob"} |json | keep http_route, response_time | unwrap response_time [$__auto]))
  3. Use Grafana UI and set a range of time last 5 min, get empty (or values)
    {
    "queries": [
    {
      "refId": "A",
      "expr": "avg_over_time({job=\"pi-logs-custom3/pi-logs-custom3\"} |json | keep http_route, response_time | unwrap response_time [$__auto])",
      "queryType": "range",
      "datasource": {
        "type": "loki",
        "uid": "grafanacloud-logs"
      },
      "editorMode": "code",
      "legendFormat": "",
      "datasourceId": 7,
      "intervalMs": 200,
      "maxDataPoints": 1343
    }
    ],
    "from": "1719009561800",
    "to": "1719009861802"
    }
  4. Use Grafana UI and set a range of time last 6 hours, get 500 error aggregation operator '"sum"' without grouping Payload:
    {
    "queries": [
    {
      "refId": "A",
      "expr": "avg_over_time({job=\"pi-logs-custom3/pi-logs-custom3\"} |json | keep http_route, response_time | unwrap response_time [$__auto])",
      "queryType": "range",
      "datasource": {
        "type": "loki",
        "uid": "grafanacloud-logs"
      },
      "editorMode": "code",
      "legendFormat": "",
      "datasourceId": 7,
      "intervalMs": 15000,
      "maxDataPoints": 1343
    }
    ],
    "from": "1718988150000",
    "to": "1719009750941"
    }
    {
    "results": {
        "A": {
            "error": "aggregation operator '\"sum\"' without grouping",
            "errorSource": "downstream",
            "status": 500
        }
    }
    }

Expected behavior I expect values being returned, or empty if no data is found

Environment:

Screenshots, Promtail config, or terminal output n/a

nickgrafana commented 1 week ago

Related https://github.com/grafana/loki/pull/12176