grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.48k stars 3.4k forks source link

loki breaks escaped quotes text values in JSON logs #5645

Closed hoegge closed 2 years ago

hoegge commented 2 years ago

Describe the bug If you post JSON data to loki, where some values contain double quotes inside the text, and they are correctly escaped, then loki destroys this by only escaping the outer JSON string quotes when storing the data. That means when the JSON data is reconstructed (retreived from loki), the JSON is no longer valid.

To Reproduce Steps to reproduce the behavior:

  1. Started Loki (Loki version 2.4.2 (branch: HEAD, revision: 525040a32)
  2. log data like shown below
  3. Retrieve the data again

Expected behavior Expect that the original JSON stored is intact, when you get it from loki. Escaped double quotes in string values in inputted JSON, should be double-escaped when you escape the JSON format quotes, otherwise the original JSON logged is destroyed, and not parsable, e.g., in Grafana. See JSON parsing problem / Prettify #46545

Environment:

Screenshots, Promtail config, or terminal output Original JSON posted to loki, where the "www-authenticate" value is the "problem". It contain escaped double quotes.

{
  "streams": [
    {
      "stream": {
        "unit": "node-red",
        "flow": "myuplink",
        "section": "http-response"
      },
      "values": [
        [
          "1647411980096000000",
          {
            "payload": "",
            "topic": "",
            "_msgid": "272365580705bdd1",
            "headers": {
              "date": "Wed, 16 Mar 2022 06:26:20 GMT",
              "content-length": "0",
              "connection": "close",
              "www-authenticate": "Bearer error=\"invalid_token\", error_description=\"The token expired at '03/16/2022 06:21:20'\"",
              "request-context": "appId=cid-v1:88a4c4fd-f195-45de-afe8-9a5f73e378a8",
              "strict-transport-security": "max-age=2592000",
              "content-security-policy": "default-src 'none'; frame-ancestors 'none'",
              "traceparent": "00-9325e931fc521b49bec6fd31a0bc6e2d-a06220efef65ec4b-00",
              "request-id": "9325e931fc521b49bec6fd31a0bc6e2d",
              "x-rate-limit-limit": "1h",
              "x-rate-limit-remaining": "9999999932",
              "x-rate-limit-reset": "2022-03-16T06:54:30.0306134Z",
              "x-node-red-request-node": "42b64f18"
            },
            "statusCode": 401,
            "responseUrl": "https://api.myuplink.com/v2/devices/20162037-2020-4-21-11-55-27/points",
            "redirectList": [],
            "retry": 0
          }
        ]
      ]
    }
  ]
}

Data retrieved from loki - the json string stores in values will not be valid, when "unpacked" / unescaped to create valid json:

{
  "status": "success",
  "data": {
    "resultType": "streams",
    "result": [
      {
        "stream": {
          "section": "http-response",
          "unit": "node-red",
          "flow": "myuplink"
        },
        "values": [
          [
            "1647447081164000000",
            "{\"payload\":\"\",\"topic\":\"\",\"_msgid\":\"f318345c04af074a\",\"headers\":{\"date\":\"Wed, 16 Mar 2022 16:11:21 GMT\",\"content-length\":\"0\",\"connection\":\"close\",\"www-authenticate\":\"Bearer error=\"invalid_token\", error_description=\"The token expired at '03/16/2022 16:06:21'\"\",\"request-context\":\"appId=cid-v1:88a4c4fd-f195-45de-afe8-9a5f73e378a8\",\"strict-transport-security\":\"max-age=2592000\",\"content-security-policy\":\"default-src 'none'; frame-ancestors 'none'\",\"traceparent\":\"00-cf0cbac4fba93f4b8f38f233a31ec5e0-5e3cbec1166dfb42-00\",\"request-id\":\"cf0cbac4fba93f4b8f38f233a31ec5e0\",\"x-rate-limit-limit\":\"1h\",\"x-rate-limit-remaining\":\"9999999947\",\"x-rate-limit-reset\":\"2022-03-16T16:32:51.0513238Z\",\"x-node-red-request-node\":\"4c781dc0\"},\"statusCode\":401,\"responseUrl\":\"https://api.myuplink.com/v2/devices/20162037-2020-4-21-11-55-27/points\",\"redirectList\":[],\"retry\":0}"
          ]
        ]
      }
    ],
    "stats": {
      "summary": {
        "bytesProcessedPerSecond": 664766,
        "linesProcessedPerSecond": 762,
        "totalBytesProcessed": 872,
        "totalLinesProcessed": 1,
        "execTime": 0.001311739
      },
      "store": {
        "totalChunksRef": 0,
        "totalChunksDownloaded": 0,
        "chunksDownloadTime": 0,
        "headChunkBytes": 0,
        "headChunkLines": 0,
        "decompressedBytes": 0,
        "decompressedLines": 0,
        "compressedBytes": 0,
        "totalDuplicates": 0
      },
      "ingester": {
        "totalReached": 1,
        "totalChunksMatched": 0,
        "totalBatches": 1,
        "totalLinesSent": 1,
        "headChunkBytes": 0,
        "headChunkLines": 0,
        "decompressedBytes": 872,
        "decompressedLines": 1,
        "compressedBytes": 586,
        "totalDuplicates": 0
      }
    }
  }
}

As you can see the quotes inside the part that contains "Bearer error=\"invalid_token\", in the value part is not "double escaped" and hence the escaping of the double quotes will disappear when you unescape / convert the json string to json.

chaudum commented 2 years ago

Hey @hoegge

Thanks for reporting. I tried to reproduce the issue, I could verify that Loki preserves JSON correctly.

$ http http://christian@localhost:3100/loki/api/v1/push @request.json Content-Type:application/json X-Scope-OrgID:christian
HTTP/1.1 204 No Content
Date: Thu, 17 Mar 2022 21:00:52 GMT

Where request contains the payload of the HTTP request.

{
  "streams": [
    {
      "stream": {
        "source": "local",
        "job": "github"
      },
      "values": [
        [
          "1647550779000000000",
          "{\"streams\":[{\"stream\":{\"unit\":\"node-red\",\"flow\":\"myuplink\",\"section\":\"http-response\"},\"values\":[[\"1647411980096000000\",{\"payload\":\"\",\"topic\":\"\",\"_msgid\":\"272365580705bdd1\",\"headers\":{\"date\":\"Wed, 16 Mar 2022 06:26:20 GMT\",\"content-length\":\"0\",\"connection\":\"close\",\"www-authenticate\":\"Bearer error=\\\"invalid_token\\\", error_description=\\\"The token expired at '03/16/2022 06:21:20'\\\"\",\"request-context\":\"appId=cid-v1:88a4c4fd-f195-45de-afe8-9a5f73e378a8\",\"strict-transport-security\":\"max-age=2592000\",\"content-security-policy\":\"default-src 'none'; frame-ancestors 'none'\",\"traceparent\":\"00-9325e931fc521b49bec6fd31a0bc6e2d-a06220efef65ec4b-00\",\"request-id\":\"9325e931fc521b49bec6fd31a0bc6e2d\",\"x-rate-limit-limit\":\"1h\",\"x-rate-limit-remaining\":\"9999999932\",\"x-rate-limit-reset\":\"2022-03-16T06:54:30.0306134Z\",\"x-node-red-request-node\":\"42b64f18\"},\"statusCode\":401,\"responseUrl\":\"https://api.myuplink.com/v2/devices/20162037-2020-4-21-11-55-27/points\",\"redirectList\":[],\"retry\":0}]]}]}"
        ]
      ]
    }
  ]
}
$ curl http://localhost:3100/loki/api/v1/query_range\?direction\=BACKWARD\&end\=1647550963853762006\&limit\=30\&query\=%7Bjob%3D%22github%22%7D\&start\=1647547363853762006 | jq '.data.result[0].values[0][1]' -r | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1952  100  1952    0     0   518k      0 --:--:-- --:--:-- --:--:--  635k
{
  "streams": [
    {
      "stream": {
        "unit": "node-red",
        "flow": "myuplink",
        "section": "http-response"
      },
      "values": [
        [
          "1647411980096000000",
          {
            "payload": "",
            "topic": "",
            "_msgid": "272365580705bdd1",
            "headers": {
              "date": "Wed, 16 Mar 2022 06:26:20 GMT",
              "content-length": "0",
              "connection": "close",
              "www-authenticate": "Bearer error=\"invalid_token\", error_description=\"The token expired at '03/16/2022 06:21:20'\"",
              "request-context": "appId=cid-v1:88a4c4fd-f195-45de-afe8-9a5f73e378a8",
              "strict-transport-security": "max-age=2592000",
              "content-security-policy": "default-src 'none'; frame-ancestors 'none'",
              "traceparent": "00-9325e931fc521b49bec6fd31a0bc6e2d-a06220efef65ec4b-00",
              "request-id": "9325e931fc521b49bec6fd31a0bc6e2d",
              "x-rate-limit-limit": "1h",
              "x-rate-limit-remaining": "9999999932",
              "x-rate-limit-reset": "2022-03-16T06:54:30.0306134Z",
              "x-node-red-request-node": "42b64f18"
            },
            "statusCode": 401,
            "responseUrl": "https://api.myuplink.com/v2/devices/20162037-2020-4-21-11-55-27/points",
            "redirectList": [],
            "retry": 0
          }
        ]
      ]
    }
  ]
}

What client do you use to push the logs to Loki? I assume there is some problem with the JSON encoding on the client side then.

hoegge commented 2 years ago

Thanks for taking a look. I can see your value after the timestamp is a stringifyed JSON and not JSON, i.e., qoutes are double-escaped. In my JSON request, the value log lines are JSON object. That is not supported then I suppose, but the push does not fail and the result is malformed JSON in the loki database.

hoegge commented 2 years ago

Just tried and created the payload with the logline in the value arrays as a stringified JSON instead of JSON, and then everything works. So apparently loki DOES accept JSON as the log line part of an entry without complaining, but only works if the log line part of the value is a single stringified JSON.

Would be good to

chaudum commented 2 years ago

Would be good to

* add to documentation https://grafana.com/docs/loki/latest/api/#post-apiprompush

I think it's quite clear from the docs that the line should be a string based on the example given.

"values": [
          [ "<unix epoch in nanoseconds>", "<log line>" ],
          [ "<unix epoch in nanoseconds>", "<log line>" ]
      ]
* and reject the request with a malformed request error or similar when the log line is not a single string, number (or boolean?)

However, I agree, that it should validate that the value is a string, and reject wrongly formatted payloads. I opened a PR that fixes the incorrect behaviour: #5685

hoegge commented 2 years ago

Great - thanks (PR). Since a log line can contain JSON, it COULD be legal with JSON, since JSON is also a string, right? So I don't fully agree. But rejecting "real" JSON and only accepting stringified JSON will solve the issue.

The documentation on push: actually also is a bit confusing, since "label" and "tag" should have the format "<label>" and "<tag>", shouldn't it? Since labes and tags should be replaced whereas, e.g., "streams", "stream" and "values" should not - but how should one know? Maybe how to push logs to loki deserves a real section in the documentation and not just a part of the API definition. It is to many probably as or more important than "just" using the agent to push logs.

But thanks again. Think loki is a great solution and much simpler and leaner than other similar solutions.

chaudum commented 2 years ago

The documentation on push: actually also is a bit confusing, since "label" and "tag" should have the format "<label>" and "<tag>", shouldn't it? Since labes and tags should be replaced whereas, e.g., "streams", "stream" and "values" should not - but how should one know?

You are right, that should be changed.

Maybe how to push logs to loki deserves a real section in the documentation and not just a part of the API definition. It is to many probably as or more important than "just" using the agent to push logs.

Usually, the user should not need to worry about how to use the push API, but rather be able to use an official client. However, having a dedicated section about how to push logs using the plain HTTP API is something that can also be useful to build your own log pushing clients.

But thanks again. Think loki is a great solution and much simpler and leaner than other similar solutions.

Glad to hear that you like Loki :)