Parse Json Logs? - Githubissues

tcborg commented 3 months ago

Hey folks,

I have a nodejs backend producing json logs with pino. But when I check the logs in grafana using the loki datasource, the json is split on multiple lines.

Tried a lot of different combinations of the json stage, but could make it work:

This is my last attempt stage.json { expressions = { output="log"} }

When I check the pod logs, I see this json, and I wanted the req piece to be whole on the grafana entry

req: {
      "id": 503,
      "method": "GET",
      "url": "/api/v3/home-feed?currentDateTime=06-09-2024-13&includeVideos=1&limit=35&teamId=CAN&trendingLimit=3",
      "query": {
        "currentDateTime": "06-09-2024-13",
        "includeVideos": "1",
        "limit": "35",
        "teamId": "CAN",
        "trendingLimit": "3"
      },
      "params": {},
      "headers": {
        "x-forwarded-for": "174.231.57.83",
        "x-forwarded-proto": "https",
        "x-forwarded-port": "443",
        "host": "api.mysite.com",
        "devicetype": "ios",
        "buildnumber": "3.02",
        "teamid": "CAN",
        "accept": "*/*",
        "authsource": "ios",
        "accept-language": "en-US;q=1.0",
        "includevideos": "true",

      "remoteAddress": "::ffff:10.133.11.205",
      "remotePort": 36070
    }

bentonam commented 3 months ago

@tcborg can you provide a complete raw log line from the pod logs?

tcborg commented 3 months ago

@bentonam

This is pretty much it

[11-06-2024 02:57:50] INFO (1): request completed
    req: {
      "id": 19256,
      "method": "GET",
      "url": "/api/v3/feed-by-id?contentType=video&currentDateTime=06-10-2024-22&feedId=zO3KRoos",
      "query": {
        "contentType": "video",
        "currentDateTime": "06-10-2024-22",
        "feedId": "zO3KRoos"
      },
      "params": {},
      "headers": {
        "x-forwarded-for": "70.17.106.134",
        "x-forwarded-proto": "https",
        "x-forwarded-port": "443",
        "host": "api.mysite.com",
        "x-amzn-trace-id": "Root=1-6667bd2e-53f0a50e72c3454803129de6",
        "devicetype": "ios",
        "buildnumber": "3.02",
        "authorization": "[Redacted]",
        "accept": "*/*",
        "authsource": "ios",
        "userid": "HpqsEKs2uvV7RFt395zpJHCTqH62",
        "contenttype": "video",
        "accept-language": "en-US;q=1.0",
        "accept-encoding": "br;q=1.0, gzip;q=0.9, deflate;q=0.8",
        "anonymoususerid": "e2b4rBuiVUauna6mURDPqT",
        "user-agent": "iOS",
        "feedid": "zO3KRoos",
        "cookie": "<redacted>"
      },
      "remoteAddress": "::ffff:10.133.31.185",
      "remotePort": 63618
    }
    res: {
      "statusCode": 200,
      "headers": {
        "x-powered-by": "Express",
        "access-control-allow-origin": "*",
        "content-type": "application/json; charset=utf-8",
        "content-length": "1541",
        "etag": "W/\"605-4irVFR88+TK3HnJKbrREhgDSyLU\"",
        "cache-control": "max-age=300"
      }
    }
    responseTime: 6

bentonam commented 3 months ago

@tcborg I'm not that familiar with pino, but according to their documentation https://github.com/pinojs/pino/blob/main/docs/pretty.md pino uses NDJSON (new-line delimited JSON). You posted what I believe to be pretty-json, which spans multiple new lines. Typically new lines should be avoided in production logs if at all possible. As it leads to unnecessarily complex log parsing. However Alloy/Promtail supports multiline, you have to provide a firstline selector.

If my understanding is correct you want to:

Only have the json body written to the loki
There is still other useful information written to the log i.e. timestamp, level, message, responseTime
The timestamp of the log is reset to the extracted value instead of the collection time
level is used as a label
message and responseTime values are extracted and set as structured metadata, this way they are still part of the log but not part of the log line and are not good candidates
The log line is reset to be just the req json with all new lines removed.

Here is an example promtail configuration file, the reason I'm using promtail is you can read from stdin:

Promtail Config:

---
clients:
  - url: https://loki.example.com/loki/api/v1/push
    basic_auth:
      username: test
      password: test

scrape_configs:
  - job_name: pino
    static_configs:
      - targets:
          - localhost
        labels:
          __path__: pino.log
    pipeline_stages:
      # strip ascii color codes
      - decolorize:
      # RE2 regular expression, if matched will start a new multiline block.
      - multiline:
          firstline: '^\[\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2}\]\s+\w+(?: \(\d+\))?:'
          max_wait_time: 3s
      # assign a default log-level of unknown
      - static_labels:
          level: unknown
      # extract the timestamp, level, req and responseTime
      - regex:
          expression: '\[(?P<timestamp>\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2})\] (?P<level>\w+)(?: \(\d+\))?:\s+(?P<message>(?:\n|.)+)req:\n*\s*(?P<json_log>(?:.|\n)+)\n*\s*(?P<responseTime>responseTime:\s+\d+)\n*$'
      # set the extracted level to be a label
      - labels:
          level:
      # set the message and responseTime as structured metadata
      - structured_metadata:
          message:
          responseTime:
      # set the metadata timestamp to be that of the extracted timestamp from the log line
      - timestamp:
          source: timestamp
          format: "02-01-2006 15:04:05"
      # remove all new lines and whitespace from the req
      - replace:
          expression: '(\n\s+)'
          source: json_log
          replace: ''
      # reset the log line to be that of just the json object
      - output:
          source: json_log

Command:

cat pino.log | promtail -stdin -config.expand-env -config.file promtail.yaml -dry-run

Output:

Example Input:

[11-06-2024 02:57:50] INFO (1): request completed
    req: {
      "id": 19256,
      "method": "GET",
      "url": "/api/v3/feed-by-id?contentType=video&currentDateTime=06-10-2024-22&feedId=zO3KRoos",
      "query": {
        "contentType": "video",
        "currentDateTime": "06-10-2024-22",
        "feedId": "zO3KRoos"
      },
      "params": {},
      "headers": {
        "x-forwarded-for": "70.17.106.134",
        "x-forwarded-proto": "https",
        "x-forwarded-port": "443",
        "host": "api.mysite.com",
        "x-amzn-trace-id": "Root=1-6667bd2e-53f0a50e72c3454803129de6",
        "devicetype": "ios",
        "buildnumber": "3.02",
        "authorization": "[Redacted]",
        "accept": "*/*",
        "authsource": "ios",
        "userid": "HpqsEKs2uvV7RFt395zpJHCTqH62",
        "contenttype": "video",
        "accept-language": "en-US;q=1.0",
        "accept-encoding": "br;q=1.0, gzip;q=0.9, deflate;q=0.8",
        "anonymoususerid": "e2b4rBuiVUauna6mURDPqT",
        "user-agent": "iOS",
        "feedid": "zO3KRoos",
        "cookie": "<redacted>"
      },
      "remoteAddress": "::ffff:10.133.31.185",
      "remotePort": 63618
    }
    res: {
      "statusCode": 200,
      "headers": {
        "x-powered-by": "Express",
        "access-control-allow-origin": "*",
        "content-type": "application/json; charset=utf-8",
        "content-length": "1541",
        "etag": "W/\"605-4irVFR88+TK3HnJKbrREhgDSyLU\"",
        "cache-control": "max-age=300"
      }
    }
    responseTime: 6

Example Promtail Output

{"id": 19256,"method": "GET","url": "/api/v3/feed-by-id?contentType=video&currentDateTime=06-10-2024-22&feedId=zO3KRoos","query": {"contentType": "video","currentDateTime": "06-10-2024-22","feedId": "zO3KRoos"},"params": {},"headers": {"x-forwarded-for": "70.17.106.134","x-forwarded-proto": "https","x-forwarded-port": "443","host": "api.mysite.com","x-amzn-trace-id": "Root=1-6667bd2e-53f0a50e72c3454803129de6","devicetype": "ios","buildnumber": "3.02","authorization": "[Redacted]","accept": "*/*","authsource": "ios","userid": "HpqsEKs2uvV7RFt395zpJHCTqH62","contenttype": "video","accept-language": "en-US;q=1.0","accept-encoding": "br;q=1.0, gzip;q=0.9, deflate;q=0.8","anonymoususerid": "e2b4rBuiVUauna6mURDPqT","user-agent": "iOS","feedid": "zO3KRoos","cookie": "<redacted>"},"remoteAddress": "::ffff:10.133.31.185","remotePort": 63618}res: {"statusCode": 200,"headers": {"x-powered-by": "Express","access-control-allow-origin": "*","content-type": "application/json; charset=utf-8","content-length": "1541","etag": "W/\"605-4irVFR88+TK3HnJKbrREhgDSyLU\"","cache-control": "max-age=300"}}

As you can see this works when accounting for multiline log messages. Now promtail will not help you with the k8s-monitoring helm chart as it is not supported, Alloy is used. Alloy provides a conversion command, if we convert the promtail file to alloy, the steps needed will be provided for you.

Alloy Convert:

alloy convert --output pino.alloy --source-format promtail promtail.yaml

local.file_match "pino" {
    path_targets = [{
        __address__ = "localhost",
        __path__    = "pino.log",
    }]
}

loki.process "pino" {
    forward_to = [loki.write.default.receiver]

    stage.decolorize { }

    stage.multiline {
        firstline = "^\\[\\d{2}-\\d{2}-\\d{4} \\d{2}:\\d{2}:\\d{2}\\]\\s+\\w+(?: \\(\\d+\\))?:"
        max_lines = 0
    }

    stage.static_labels {
        values = {
            level = "unknown",
        }
    }

    stage.regex {
        expression = "\\[(?P<timestamp>\\d{2}-\\d{2}-\\d{4} \\d{2}:\\d{2}:\\d{2})\\] (?P<level>\\w+)(?: \\(\\d+\\))?:\\s+(?P<message>(?:\\n|.)+)req:\\n*\\s*(?P<json_log>(?:.|\\n)+)\\n*\\s*(?P<responseTime>responseTime:\\s+\\d+)\\n*$"
    }

    stage.labels {
        values = {
            level = null,
        }
    }

    stage.structured_metadata {
        values = {
            message      = null,
            responseTime = null,
        }
    }

    stage.timestamp {
        source = "timestamp"
        format = "02-01-2006 15:04:05"
    }

    stage.replace {
        expression = "(\\n\\s+)"
        source     = "json_log"
    }

    stage.output {
        source = "json_log"
    }
}

loki.source.file "pino" {
    targets               = local.file_match.pino.targets
    forward_to            = [loki.process.pino.receiver]
    legacy_positions_file = "/var/log/positions.yaml"
}

loki.write "default" {
    endpoint {
        url = "https://loki.example.com/loki/api/v1/push"

        basic_auth {
            username = "test"
            password = "test"
        }
    }
    external_labels = {}
}

I hope this is helpful for you and helps to solve your issues.

Download Files: pino-example.zip

petewall commented 1 week ago

This has been answered, so closing.

grafana / k8s-monitoring-helm

Parse Json Logs? #561