grafana / helm-charts

Apache License 2.0
1.65k stars 2.27k forks source link

[promtail] Regex issue with the pipelineStages YAML value #2428

Open adesprez opened 1 year ago

adesprez commented 1 year ago

Given the following values for .Values.config.snippets.pipelineStages:

config:
[...]
    pipelineStages:
    - multiline:
        firstline: '^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}'
        max_wait_time: 3s

Expected result for the promtail Secret:

scrape_configs:
  # See also https://github.com/grafana/loki/blob/master/production/ksonnet/promtail/scrape_config.libsonnet for reference
  - job_name: kubernetes-pods
    pipeline_stages:
      - cri: {}
      - multiline:
          firstline: '^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}'
          max_wait_time: 3s

Actual result for the promtail Secret:

scrape_configs:
  # See also https://github.com/grafana/loki/blob/master/production/ksonnet/promtail/scrape_config.libsonnet for reference
  - job_name: kubernetes-pods
    pipeline_stages:
      - cri: {}
      - multiline:
          firstline: ^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}
          max_wait_time: 3s

(no single quotes ' enclosing the regex).

As a consequence, Promtail is not processing that firstline as a regex. Single quotes are mandatory here.

Code responsible of that, charts/promtail/values.yaml:

    scrapeConfigs: |
      # See also https://github.com/grafana/loki/blob/master/production/ksonnet/promtail/scrape_config.libsonnet for reference
      - job_name: kubernetes-pods
        pipeline_stages:
          {{- toYaml .Values.config.snippets.pipelineStages | nindent 4 }}

The toYaml function is stripping the single quotes.

Something that could solve that issue, is to use tpl function to render pipelineStages, as suggested here by tiagodj: https://github.com/grafana/helm-charts/issues/2412

xyfleet commented 1 year ago

I had same issue here. Any progress for this one? @adesprez Have you fixed this issue?

adesprez commented 1 year ago

@xyfleet , I haven't found a workaround. I tried multiple syntaxes to escape characters and/or protect the single quotes. No luck. I have parked this, since then.

chrizel commented 1 year ago

Having the same problem, I can confirm that the tpl function can be used to work around this problem by moving the regex to its own custom values variable and then using it in the pipelineStages definition:

  - match:
      selector: '{name="foobar"}'
      stages:
      - regex:
          expression: {{ .Values.mySpecialRegex }}

I have created PR #2575 that makes the tpl modification in the promtail helm chart.

xyfleet commented 1 year ago

@chrizel Thanks for your update. The new code has been released. I tried to update to the new release, 6.14.1 and found the secret still did not have the single quote. Looks like the tpl does not work as expected. Is there anything something with my setting?

image

I am using the same setting as described before.

snippets:
    pipelineStages:
      - multiline:
          firstline: '^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}'
          max_lines: 128
          max_wait_time: 3s
zanhsieh commented 1 year ago

Folks, can somebody trace down to configmap? Install promtail -> go to promtail container -> cat /etc/promtail/promtail.yaml

https://github.com/grafana/helm-charts/blob/d8a64071970b5928cf990dce480925ff6b8edb56/charts/promtail/templates/_pod.tpl#L47 https://github.com/grafana/helm-charts/blob/d8a64071970b5928cf990dce480925ff6b8edb56/charts/promtail/templates/secret.yaml#L17

andrejshapal commented 1 year ago

Hello, Guys, you broke everything? Because previously everything was working fine and not anymore. I see exactly all '' are removed from my config after bumping to promtail-6.14.1

My secret without tpl:

      - template:
          source: studioLocation
          template: '{{if eq .prefix "1"}}Riga{{else if eq .prefix "2"}}Bucharest{{else
            if eq .prefix "3"}}Tbilisi{{else}}Unknown{{end}}'

With tpl:

      - template:
          source: studioLocation
          template: 'Unknown'

Obviously this is not the only one stage which is completely broken.

As a consequence, Promtail is not processing that firstline as a regex. Single quotes are mandatory here. Not sure. Do you have any errors with this?

I have multiline config and it does not produce any errors, which would happen if something would be not ok. Anyway, fix this issue with promtail, not with helm chart.

chrizel commented 1 year ago

Hi andrejshapal,

sorry for the problem. I made this change only to allow us to be able to use the regex stage in promtail, and this suggestion looked like a way to make it work (at least it works for my use case, but I'm only using regex). Now it seems that the tpl change creates this conflict with the template stage which itself uses Go template syntax. You could work around this problem by putting your template strings in variables and then referencing it in the template: ... part, but obviously that's ugly too.

To be fair, I'm not really satisfied with both solutions. template now has a problem with tpl. But without it we can't really use regex. Anybody has an idea how we can make both to work?

As a consequence, Promtail is not processing that firstline as a regex. Single quotes are mandatory here. Not sure. Do you have any errors with this?

The problem is, that these regexes didn't work at all because of the missing quote. Longer regexes even spanned over multiple lines because of the toYaml.

andrejshapal commented 1 year ago

@chrizel I fixed it by rollback (setting old config manually in my values.yaml snippet). I looked into promtail source code and I see yaml parsing and mapping is used. Also, there are errors for every case if decoing failed. If something is wrong with '', there is a bug. Several lines should not be a problem as well.

andrejshapal commented 1 year ago

@chrizel I made tests to confirm issue and did not find anything. Here we have logs I am producing:

image

Here is my pipeline config:

        snippets:
          pipelineStages:
            - docker: {}
            - multiline:
                firstline: '^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\\n'
                max_wait_time: 3s
            - regex:
                expression: '^(?P<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}) (?P<message>(?s:.*))$'

The rest is default from chart version targetRevision: 6.6.1.

My secret content:

server:
  log_level: info
  http_listen_port: 3101

clients:
  - tenant_id: 1
    url: http://loki-gateway.logging/loki/api/v1/push

positions:
  filename: /run/promtail/positions.yaml
scrape_configs:
  # See also https://github.com/grafana/loki/blob/master/production/ksonnet/promtail/scrape_config.libsonnet for reference
  - job_name: kubernetes-pods
    pipeline_stages:
      - docker: {}
      - multiline:
          firstline: ^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\\n
          max_wait_time: 3s
      - regex:
          expression: ^(?P<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}) (?P<message>(?s:.*))$
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels:
          - __meta_kubernetes_pod_controller_name
        regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
        action: replace
        target_label: __tmp_controller_name
      - source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_name
          - __meta_kubernetes_pod_label_app
          - __tmp_controller_name
          - __meta_kubernetes_pod_name
        regex: ^;*([^;]+)(;.*)?$
        action: replace
        target_label: app
      - source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_instance
          - __meta_kubernetes_pod_label_release
        regex: ^;*([^;]+)(;.*)?$
        action: replace
        target_label: instance
      - source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_component
          - __meta_kubernetes_pod_label_component
        regex: ^;*([^;]+)(;.*)?$
        action: replace
        target_label: component
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_node_name
        target_label: node_name
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        replacement: $1
        separator: /
        source_labels:
        - namespace
        - app
        target_label: job
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: pod
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_container_name
        target_label: container
      - action: replace
        replacement: /var/log/pods/*$1/*.log
        separator: /
        source_labels:
        - __meta_kubernetes_pod_uid
        - __meta_kubernetes_pod_container_name
        target_label: __path__
      - action: replace
        regex: true/(.*)
        replacement: /var/log/pods/*$1/*.log
        separator: /
        source_labels:
        - __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
        - __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
        - __meta_kubernetes_pod_container_name
        target_label: __path__

limits_config:

What I see in grafana:

image

Please, send me your full config (promtail) (where you see an issue). Add how your logs looks in grafana. Mention your helm chart version and promtail version.

chrizel commented 1 year ago

@andrejshapal I will look into my regex problem when I have more time, but it was in line with what @adesprez wrote.

In the meantime, I have created PR #2584 which reverts the tpl change. Sorry for the inconvenience.

andrejshapal commented 1 year ago

@chrizel Don't worry. It is ok. Thank you for reverting.

xyfleet commented 1 year ago

Thanks everyone. Do we know which old version does not have this issue we can rollback? Or any other idea to fix this issue?

andrejshapal commented 1 year ago

@xyfleet Which issue? We have two.

xyfleet commented 1 year ago

@andrejshapal My issue is that the promtail cannot put multiline logs into one line. As a result, a entire log was divided into multilines on Grafana. Based on the previous discussion, this issue was caused by the missing '' in the firstline: '^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}'

andrejshapal commented 1 year ago

@xyfleet I don't think this is your issue. Few posts above I have shown the test and '' is not deeded. Please, send me your full config (promtail) (where you see an issue). Add how your logs looks in grafana. Mention your helm chart version and promtail version.

xyfleet commented 1 year ago

@andrejshapal my loki setting: (I only update the config part, very simple.) Helm chart: 6.14.1

config:
  clients:
    - url: http://loki:3100/loki/api/v1/push
  snippets:
    pipelineStages:
      - multiline:
          firstline: '^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}'
          max_lines: 128
          max_wait_time: 3s

The same log in Pod and in Grafana. Log from Pod: Screenshot2023_08_11_094345

Log in Grafana From the screenshot, the log in Grafana is divided into several lines with their own timestamp.

Screenshot2023_08_11_095011

andrejshapal commented 1 year ago

@xyfleet Unfortunately on second screenshot (from grafana) I can't see exactly the first line which regex should catch. It rather important. From what I can see without it:

  1. '^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}' In the log timeframe looks like dddd-dd-dd dd:dd:dd, with regex you try to catch dddd-dd-ddTdd:dd:dd. This will never work.
  2. Additionally, as said in https://grafana.com/docs/loki/latest/clients/promtail/stages/multiline/ :
    # Flag (?s:.*) needs to be set for regex stage to capture full traceback log in the extracted map.

    So, basically (if I understand correct), regex stage is compulsory to get the time from multiline. You have to add:

    - regex:
    expression: '^(?P<time>\[\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2}\]) (?P<message>(?s:.*))$'
xyfleet commented 1 year ago

@andrejshapal

Based on your suggestion, I updated the Promtail config setting. But still does not work (The log in Grafana are still in multilines.). Can you check?

 pipelineStages:
      - multiline:
          firstline: '^\[\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2}\]'
          max_lines: 128
          max_wait_time: 3s
      - regex:
          expression: '^(?P<time>\[\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2}\]) (?P<message>(?s:.*))$'

Please see the new screenshots.

log from pod: Screenshot2023_08_11_121051

log from grafana: Screenshot2023_08_11_121205

andrejshapal commented 1 year ago

@xyfleet Not exactly. Your time 2023-08-11 18:07:16.189 is not enclosed in []. Therefore your config should be:

pipelineStages:
      - multiline:
          firstline: '^\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2}' // https://regex101.com/r/DRyksF/1
          max_lines: 128
          max_wait_time: 3s
      - regex:
          expression: '^(?P<time>\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2})(?P<message>(?s:.*))$' // https://regex101.com/r/0a3gUP/1

Additionally, I have a suspicion you may need parser before doing multiline:

pipelineStages:
      - cri: {}
      - multiline:
          firstline: '^\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2}'
          max_lines: 128
          max_wait_time: 3s
      - regex:
          expression: '^(?P<time>\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2})(?P<message>(?s:.*))$'

I am not really sure. Just think so looking on your latest screenshot. Don't know why, but they have different format with the first one.

xyfleet commented 1 year ago

@andrejshapal Back to your question "Don't know why, but they have different format with the first one." There are two formats of time in my previous posts.

Screenshot2023_08_11_110125 This format comes from my original setting without any multiline and regex.

Screenshot2023_08_11_150448 This format comes from the updated setting with multiline and regex

Based on the configuration below, the message in Grafana still got divided into multi-lines as you see before. I thought this is caused by the missing '' in the secret. Isn't it?

pipelineStages:
     - cri: {}
     - multiline:
           firstline: '^\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2}'
           max_lines: 128
           max_wait_time: 3s
     - regex:
          expression: '^(?P<time>\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2})(?P<message>(?s:.*))$'

Screenshot2023_08_11_172519

Additionally, I have a suspicion you may need parser before doing multiline

What do you mean?

andrejshapal commented 1 year ago

I thought this is caused by the missing '' in the secret. Isn't it?

There are no issue with missing '' because nothing is missing.

This format comes from the updated setting with multiline and regex

I don't think this is possible.

  1. Screenshot - the logs are parsed from cri format
  2. Screenshot - logs are not parsed from cri format and regex stage applied
  3. Screenshot - logs are not parsed from cri format and regex stage applied This makes me think for the first grafana screenshot you did not post the full config of your pipeline stages or you have executed query with some parsers.

When executing query and making screenshot of grafana logs, make sure your UI settings are the following:

image

What do you mean?

On the last 2 screenshots your logs are in cri format. cri format contains of: "2019-04-30T02:12:41.8443515Z stdout F message" Your goal is to match message with multiline regex, then attach to this message few more messages for next lines and apply regex stage to combined message in order to retrieve time and keep only combined message as log.

pipelineStages:
     - cri: {} // parse log
     - multiline:
           firstline: '^\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2}' // catch first line which starts with timestamp
           max_lines: 128
           max_wait_time: 3s
     - regex:
          expression: '^(?P<time>\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2})(?P<message>(?s:.*))$' // after logs combined in one line, retrieve timestamp

I don't think you have added - cri: {} when doing last screenshot.

Also, after applying your config in helm, please, post config from generated secret.

xyfleet commented 1 year ago

@andrejshapal Thank you so much. After testing with the configuration, I can see the log correctly in Grafana.

pipelineStages:
     - cri: {} // parse log
     - multiline:
           firstline: '^\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2}' // catch first line which starts with timestamp
           max_lines: 128
           max_wait_time: 3s
     - regex:
          expression: '^(?P<time>\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2})(?P<message>(?s:.*))$' // after logs combined in one line, retrieve timestamp

Screenshot2023_08_13_103554

In my previous tests, I did not add - cri: {} to my configuration directly, but this is in the values.yaml file Screenshot2023_08_13_103716

So, I think the - cri: {} should be added to my configuration by default. This is the reason I did not put the configuration in my code as well. But anyway, it works as expected. Thanks.

luizbafilho commented 1 year ago

My use case for this is to data masking as per docs: https://grafana.com/docs/loki/latest/clients/promtail/stages/replace/#with-replace-value-in-template-format-with-hashing-for-obfuscating-data

andrejshapal commented 1 year ago

@luizbafilho Hello, What is the problem?