grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.29k stars 175 forks source link

Documentation feedback: /docs/sources/reference/components/loki.process.md #1056

Open karlrwjohnson opened 3 months ago

karlrwjohnson commented 3 months ago

It's unclear to me how to use the replace argument of a stage.replace block.

My initial assumption looking at it is that it works against the entire captured expression, like GNU sed. I.e. the replace argument would replace the entire string matched by expression. After several re-readings and a couple of tests, I'm beginning to suspect my assumptions were wrong.

The problem I'm trying to solve starting with is that NGINX error_log cannot be customized, so I intend to use stage.regex to convert it to logfmt and then add a stage.logfmt block to parse it.

For example, NGINX logs this when it starts up:

2024/06/15 19:39:46 [notice] 1#1: using the "epoll" event method
2024/06/15 19:39:46 [notice] 1#1: nginx/1.27.0
2024/06/15 19:39:46 [notice] 1#1: built by gcc 12.2.0 (Debian 12.2.0-14) 
2024/06/15 19:39:46 [notice] 1#1: OS: Linux 5.15.0-112-generic
2024/06/15 19:39:46 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 65536:65536
2024/06/15 19:39:46 [notice] 1#1: start worker processes
2024/06/15 19:39:46 [notice] 1#1: start worker process 29

This is how I'm attempting to use this block to reformat the log:

loki.process "logfmt" {
  forward_to = [loki.write.local_loki.receiver]

  // NGINX: Parse error_log lines, which cannot be reformatted to logfmt (see https://stackoverflow.com/questions/4246756/is-it-possible-to-specify-custom-error-log-format-on-nginx)
  stage.match {
    selector = "{instance=~\"production/nginx-.+\"}"
    stage.replace {
      expression = "(?P<year>\\d{4})/(?P<month>\\d{2})/(?P<day>\\d{2}) (?P<hour>\\d{2}):(?P<minute>\\d{2}):(?P<second>\\d{2}) \\[(?P<level>[a-z]+)\\] (?P<processId>\\d+)#(?P<threadId>\\d+): (?P<message>.+)"
      replace = "time={{.Values.year}}-{{.Values.month}}-{{.Values.day}}T{{.Values.hour}}:{{.Values.minute}}:{{.Values.second}}z level={{.Values.level}} processId={{.Values.processId}} processId={{.Values.processId}} message={{.Values.message}}"
    }
  }
}

My initial question is how to use the named capture groups. I'm unfamiliar with Go so I can only guess at the template language syntax. Should I be using top-level values like {{.year}} or should I use the "Values" prefix {{.Values.year}}?

In either case, I get log lines like this:

time=<no value>-<no value>-<no value>T<no value>:<no value>:<no value>z level=<no value> processId=<no value> processId=<no value> message=<no value>/time=<no value>-<no value>-<no value>T<no value>:<no value>:<no value>z level=<no value> processId=<no value> processId=<no value> message=<no value>/time=<no value>-<no value>-<no value>T<no value>:<no value>:<no value>z level=<no value> processId=<no value> processId=<no value> message=<no value> time=<no value>-<no value>-<no value>T<no value>:<no value>:<no value>z level=<no value> processId=<no value> processId=<no value> message=<no value>:time=<no value>-<no value>-<no value>T<no value>:<no value>:<no value>z level=<no value> processId=<no value> processId=<no value> message=<no value>:time=<no value>-<no value>-<no value>T<no value>:<no value>:<no value>z level=<no value> processId=<no value> processId=<no value> message=<no value> [time=<no value>-<no value>-<no value>T<no value>:<no value>:<no value>z level=<no value> processId=<no value> processId=<no value> message=<no value>] time=<no value>-<no value>-<no value>T<no value>:<no value>:<no value>z level=<no value> processId=<no value> processId=<no value> message=<no value>#time=<no value>-<no value>-<no value>T<no value>:<no value>:<no value>z level=<no value> processId=<no value> processId=<no value> message=<no value>: time=<no value>-<no value>-<no value>T<no value>:<no value>:<no value>z level=<no value> processId=<no value> processId=<no value> message=<no value>

Heyyyyy wait a minute...

After copying this text from Grafana to an editor with line wraps turned on, I finally see that the replace block seems to operate on each capture group individually. My template groups resolve to <no value> because .Value itself takes on the value of each named capture group! My expectation was incorrect.

Maybe I'm trying to shove square pegs into round holes here. Maybe I'm supposed to use stage.regex to parse these capture groups into labels. But doesn't Loki want me not to create too many labels? I don't know. I was hoping to at least get all my logs looking the same before I figured out what else I'm supposed to do with them.

karlrwjohnson commented 3 months ago

Also: It would be useful to have a link to documentation for the functions supported in the replace argument.

The following list contains available functions with examples of more complex replace fields.

ToLower, ToUpper, Replace, Trim, TrimLeftTrimRight, TrimPrefix, TrimSuffix, TrimSpace, Hash, Sha2Hash, regexReplaceAll, regexReplaceAllLiteral

"{{ if eq .Value \"200\" }}{{ Replace .Value \"200\" \"HttpStatusOk\" -1 }}{{ else }}{{ .Value | ToUpper }}{{ end }}"
"*IP4*{{ .Value | Hash "salt" }}*"

Is this what they're referring to? https://coveooss.github.io/gotemplate/docs/functions_reference/sprig-regex/

karlrwjohnson commented 3 months ago

I found the unholy abomination that accomplishes what I was looking to do:

loki.process "logfmt" {
  forward_to = [loki.write.local_loki.receiver]

  // NGINX: Parse error_log lines, which cannot be reformatted to logfmt (see https://stackoverflow.com/questions/4246756/is-it-possible-to-specify-custom-error-log-format-on-nginx)
  stage.match {
    selector = "{instance=~\"production/nginx-.+\"}"
    stage.replace {
      // Regex is processed with the RE2 library (https://github.com/google/re2/wiki/Syntax)
      // Must match the entire line with one capture group for the `replace` argument to rewrite it
      expression = "(^\\d{4}/\\d{2}/\\d{2} \\d{2}:\\d{2}:\\d{2} \\[[a-z]+\\] \\d+#\\d+: .+)"

      // Use `regexReplaceAll` to rewrite the log line in logfmt format
      // Docs (?): https://coveooss.github.io/gotemplate/docs/functions_reference/sprig-regex/
      replace = "{{ regexReplaceAll \"(\\\\d{4})/(\\\\d{2})/(\\\\d{2}) (\\\\d{2}):(\\\\d{2}):(\\\\d{2}) \\\\[([a-z]+)\\\\] (\\\\d+)#(\\\\d+): (.+)\" .Value \"time=${1}-${2}-${3}T${4}:${5}:${6}z level=${7} processId=${8} threadId=${9} message=${10}\" }}"
    }
  }
}

It has double-nested strings and the regular expression is repeated twice.

I also discovered that the $ replacements must be in curly braces when next to letters because otherwise "$3T4" gets interpreted as $4

github-actions[bot] commented 2 months ago

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it. If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue. The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity. Thank you for your contributions!