Add log message in alert

saada commented 2 years ago

Is your feature request related to a problem? Please describe. We need to retrieve contents off of the matching alert rules. Currently alert rules take in metric LogQL queries and report either true or false. If the query is true, it notifies alertmanager and all is good.

However, if we wanted to send the contents of the logs that match the query, it would almost need to be a separate field to choose the exact logs that would be sent as part of the notification message.

Describe the solution you'd like

Solution 1: Infer the log query from the metric query

This is better because the user won't have to change any existing alert rules and can simply update the alert message render template. A new {{ $log := range .Logs }} is magically available.

Solution 2: Allow user to add their own custom log query in addition to metric query

If it's hard to derive log message from a metric query, Loki ruler could take two queries.

A metric query for triggering the alert itself
An optional log query to pass in to the message template such as {{ $log := range .LogMessages }}

Describe alternatives you've considered The only alternative was to build my own ruler? Or use some other solution like Graylog that already supports this feature.

rkonfj commented 2 years ago

any progress?

storm1kk commented 1 year ago

That feature would be great to have! Any ETA?

RainM commented 1 year ago

I just have configured loki + alertmanager for alerting. It took me a lot of patience) I think loki team should add additional examples and documentation. It's really hard to understand how to configure all this staff. At some point of time I lorked through loki sources without any result. @storm1kk Do you need example configs?

storm1kk commented 1 year ago

@RainM yes, please

RainM commented 1 year ago

We use fluent-bit for logs processing from java application to kaffra (redpanda actually). Java emits logs as JSON.
- Java appender config: <JSONLayout compact="true" eventEol="true" properties="true" stacktraceAsString="true" includeTimeMillis="true" />
- fluentbit output config:

...
[OUTPUT]
    Name kafka
    Brokers ...
    Topics logs
    Match logs
    Retry_Limit False
    rdkafka.request.required.acks 1
    rdkafka.delivery.timeout.ms 5000
    rdkafka.compression.codec gzip
    rdkafka.queue.buffering.max.ms 1000
    rdkafka.queue.buffering.max.messages 10000
    storage.total_limit_size  100M

Promtail reads data from kafka/redpanda. config is below:

....
scrape_configs:
  - job_name: kafka
    kafka:
      brokers:
        - .....
      topics:
        - logs
      labels:
        job: kafka_logs
    relabel_configs:
      - action: replace
        source_labels:
          - __meta_kafka_topic
        target_label: topic
    pipeline_stages:
      - json:
          expressions:
             fluentbit_timestamp: '"@timestamp"'
             log_file_name: path
             service_name: service_name
             log:
      - timestamp:
          source: fluentbit_timestamp
          format: Unix
      - json:
          expressions:
            app_timestamp: timeMillis
            thread: thread
            level: level
            logger_name: loggerName
            message: message
            logger_full_name: loggerFqcn
            thread_id: threadId
            thread_priority: threadPriority
          source: log
      - timestamp:
          source: app_timestamp
          format: UnixMs
      - template:
          source: message
          template: '{{ .app_timestamp }} | {{ ToUpper .level }} | {{ .thread }}@{{ .thread_id }} | {{ .message }}'
      - labels:
          log_file_name: log_file_name
          thread: thread
          level: level
          logger_name: logger_name
          logger_full_name: logger_full_name
          thread_priority: thread_priority
          thread_id: thread_id
          service_name: service_name
      - output:
          source: message

at line with 'template' we generate particular message for given tags like service name, timestamp, etc. This message is displayed at grafana during logs requesting.

loki config has quite typical config. the only hint here is to add 'fake' folder if you use single-tenant setup. Details here: https://stackoverflow.com/questions/66780585/grafana-loki-does-not-trigger-or-push-alert-on-alertmanager
Loki rules:

---
groups:
- name: "all-instances"
  rules:
  - alert: "no_logs_from_service"
    expr: "absent_over_time({service_name=\"YOUR_SEVICE_NAME\"} [10m])"
    for: "1m"
    annotations:
      description: "No logs for more then 10 min"
      title: "No logs from {{ $labels.service_name }}"
    labels:
      severity: "critical"
      instance: "PROD"
      service: "{{ $labels.service_name }}"

as far as I understand, labels.service_name are from promtail's config.

Alertmanager config. Nothing interesting expect I had to add my custom template. We use alerting to telegram only. Config is:

global:
  resolve_timeout: 1m

route:
  receiver: 'telegram-notifications'

receivers:
  - name: 'telegram-notifications'
    telegram_configs:
      - bot_token: '....'
        chat_id: -.......
        api_url: "https://api.telegram.org"
        parse_mode: HTML

templates:
  - '/etc/alertmanager/*.tmpl'

Alertmanager template. really hard to debug thing with no public experience at all. I have quite little understanding how it actually works. But it works. Keep attention at escaping strings with '... | safeHtml '

{{/* https://prometheus.io/docs/alerting/latest/notifications/ */}}

{{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver | urlquery }}{{ end }}

{{ define "telegram.default.message" }}
{{- range .Alerts.Firing }}
{{ if eq .Labels.severity "warning" }}🚩{{ else if eq .Labels.severity "critical" }}🆘{{ else }}❔️{{ end }}
<b>{{ .Labels.service }}</b> at {{ .Labels.instance }}
{{ .Annotations.title }}{{ if .Annotations.description }}
<i>({{ .Annotations.description | safeHtml }})</i>{{ end }}

{{- end }}
{{- range .Alerts.Resolved }}
🆗 {{ .Annotations.summary }}{{ if .Annotations.description }} <i>({{ .Annotations.description | safeHtml }})</i>{{ end }}
{{- end }}
{{ end }}

RainM commented 1 year ago

Alert for error messages with particular message text. Looks like need to debug deduplication at alertmanager.

  - alert: "log_warning"
    expr: "count_over_time({level=\"ERROR\"} | pattern `<message>` [1h])"
    for: "1m"
    annotations:
      description: "{{ $labels.message }}"
      title: "ERROR in logs for {{ $labels.service_name }}"
    labels:
      severity: "critical"
      instance: "PROD"
      service: "{{ $labels.service_name }}"

rayrapetyan commented 1 year ago

description: "{{ $labels.message }}" translates into: Description: <no value> for a server version of grafana (v9.2.5)

himao commented 1 year ago

Can we send error loglines as part of an alert? Any progress or workaround?

rkonfj commented 1 year ago

Can we send error loglines as part of an alert? Any progress or workaround?

@himao , You can use pattern parser expression extract error log line as label , so we can get the log lines from the alert. @RainM has given us an example config https://github.com/grafana/loki/issues/5844#issuecomment-1346436205

rayrapetyan commented 1 year ago

@rkonfj, the method described doesn't work. Here is my alert rule:

That's how it looks like in email:

What am I missing? Thanks.

rkonfj commented 1 year ago

@rayrapetyan , Looks like you're using grafana alert, configure no data and error handling to avoid DatasourceNoData alert

rkonfj commented 1 year ago

I am using Loki alert and working very well.

rayrapetyan commented 1 year ago

Hm, I see. Few times I was able to get valid text messages in email, but then destroyed the alert item and recreated it from scratch and not able to get any text anymore :(

On Sat, Feb 11, 2023, 7:25 PM rkonfj @.***> wrote:

I am using Loki alert and working very well.

— Reply to this email directly, view it on GitHub https://github.com/grafana/loki/issues/5844#issuecomment-1426932926, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABD3S2ZOFNECOTEQ6BSDA53WXBJ37ANCNFSM5S52XCBQ . You are receiving this because you were mentioned.Message ID: @.***>

sureshgoli25 commented 1 year ago

I am using Loki alert and working very well.

Is it possible to provide example to parse json log line message in a message body?

rayrapetyan commented 1 year ago

That's how I was able to finally configure alerts from UI:

rkonfj commented 1 year ago

Is it possible to provide example to parse json log line message in a message body?

Sure, @sureshgoli25

For example, the originl log lines have common label app=order

{"level": "warn", "user": "xiaohua", "event": "access_secret", "time":"2023-03-07 08:30:00"}
{"level": "info", "user": "xiaohua", "event": "access_normal_page1", "time":"2023-03-07 08:31:00"}
{"level": "info", "user": "xiaohua", "event": "access_normal_page2", "time":"2023-03-07 08:32:00"}
{"level": "warn", "user": "xiaohua", "event": "access_secret", "time":"2023-03-07 08:33:00"}

We can use this metrics query to generate alert when the warn log lines appear.

count_over_time({app="order"} | json | level="warn" [5m])

In this example, this will generate 2 alert, and every alert have lables level, user, event, time and static label app.

rayrapetyan commented 1 year ago

My solution above doesn't work anymore after upgrading grafana to v10.0.0 :(. Fortunately, it can be easily resolved by replacing with:

{{ $values.B0.Labels.message }}

vizualme commented 1 year ago

Working on this query for an alert: count_over_time({job="ad_basic"} |= `4688` | json host="hostname", severity="severity", body="body", uuid="uuid", timestamp="ISO_8601_ts" [1s]) Query is returning multiple results and I'm trying to just pull label/value pairs only from the last result of those evaluated.

example annotation ( this format is working for all the json expressions I've defined above but sends multiple values with each label):

{{ range $v := $values }}
{{ $v.Labels.host }}
{{ end }}

Is it possible to index results in the annotation template?

kibrovic commented 1 year ago

In case someone needs help with message preview using promtail + loki + alertmanager (all deployed using helm charts). I'm parsing log4j logs output. Using | pattern `<message>` I'm extracting the field message from log line. This is the actual log which is then used for sum by (message) function which will add label message. I'm adding the annotation message and assigning the sliced label value to it. This annotations is later sent to alertmanager and used for slack message.

Note that we're passing the label message to alertmanager by default. This can (usually is) be quite a long line and parsing this label can be cumbersome.

Loki config

      groups:
        - name: prod_error_logs
          rules:
          - alert: ProdErrorLogs
            expr: |
              sum(rate({namespace="prod"} |= `ERROR` | pattern `<message>` [5m])) by (message) > 0
            for: 0m
            labels:
              severity: critical
              namespace: prod
              category: logs
            annotations:
              summary: "There are Error logs on PROD"
              sourceLink: "<source_link>"
              message: {{`"{{ slice $labels.message 0 500 }}"`}}

Alertmanager config

    slack_configs:
      - channel: "<CHANNEL_ID>"
        send_resolved: true
        title: |-
          [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}]
          PROJECT_NAME: {{ range .Alerts }}*{{ .Annotations.summary }}*
          {{ end }}
        text: |-
          {{ range .Alerts }}
              *Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
              {{- if eq .Labels.category "logs" }}
              *Source:* <{{ .Annotations.sourceLink }}|View in Grafana>
              *Log:*

          {{ .Annotations.message }}
          ```
          {{- else }}
          *Description:* {{ .Annotations.description }}
          *Details:*
          {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
          {{- end }}
          {{ end }}
      {{ end }}

eschulma commented 6 months ago

For Grafana 11, at least in the cloud version, there have been more changes. I recommend avoiding the Classic condition. If you do that, the default notification template spits out all labels. And if you include

| pattern <message> (note that <message> itself also needs to be surrounded by backticks, but markup here messes that up)

at the end of your query you will see the log line within the label message.

jdaiGitHub commented 5 months ago

Can you tell me the minimum version that supports the use of label in alert rules?

grafana / loki