Open saada opened 2 years ago
any progress?
That feature would be great to have! Any ETA?
I just have configured loki + alertmanager for alerting. It took me a lot of patience) I think loki team should add additional examples and documentation. It's really hard to understand how to configure all this staff. At some point of time I lorked through loki sources without any result. @storm1kk Do you need example configs?
@RainM yes, please
<JSONLayout compact="true" eventEol="true" properties="true" stacktraceAsString="true" includeTimeMillis="true" />
...
[OUTPUT]
Name kafka
Brokers ...
Topics logs
Match logs
Retry_Limit False
rdkafka.request.required.acks 1
rdkafka.delivery.timeout.ms 5000
rdkafka.compression.codec gzip
rdkafka.queue.buffering.max.ms 1000
rdkafka.queue.buffering.max.messages 10000
storage.total_limit_size 100M
....
scrape_configs:
- job_name: kafka
kafka:
brokers:
- .....
topics:
- logs
labels:
job: kafka_logs
relabel_configs:
- action: replace
source_labels:
- __meta_kafka_topic
target_label: topic
pipeline_stages:
- json:
expressions:
fluentbit_timestamp: '"@timestamp"'
log_file_name: path
service_name: service_name
log:
- timestamp:
source: fluentbit_timestamp
format: Unix
- json:
expressions:
app_timestamp: timeMillis
thread: thread
level: level
logger_name: loggerName
message: message
logger_full_name: loggerFqcn
thread_id: threadId
thread_priority: threadPriority
source: log
- timestamp:
source: app_timestamp
format: UnixMs
- template:
source: message
template: '{{ .app_timestamp }} | {{ ToUpper .level }} | {{ .thread }}@{{ .thread_id }} | {{ .message }}'
- labels:
log_file_name: log_file_name
thread: thread
level: level
logger_name: logger_name
logger_full_name: logger_full_name
thread_priority: thread_priority
thread_id: thread_id
service_name: service_name
- output:
source: message
at line with 'template' we generate particular message for given tags like service name, timestamp, etc. This message is displayed at grafana during logs requesting.
loki config has quite typical config. the only hint here is to add 'fake' folder if you use single-tenant setup. Details here: https://stackoverflow.com/questions/66780585/grafana-loki-does-not-trigger-or-push-alert-on-alertmanager
Loki rules:
---
groups:
- name: "all-instances"
rules:
- alert: "no_logs_from_service"
expr: "absent_over_time({service_name=\"YOUR_SEVICE_NAME\"} [10m])"
for: "1m"
annotations:
description: "No logs for more then 10 min"
title: "No logs from {{ $labels.service_name }}"
labels:
severity: "critical"
instance: "PROD"
service: "{{ $labels.service_name }}"
as far as I understand, labels.service_name are from promtail's config.
global:
resolve_timeout: 1m
route:
receiver: 'telegram-notifications'
receivers:
- name: 'telegram-notifications'
telegram_configs:
- bot_token: '....'
chat_id: -.......
api_url: "https://api.telegram.org"
parse_mode: HTML
templates:
- '/etc/alertmanager/*.tmpl'
{{/* https://prometheus.io/docs/alerting/latest/notifications/ */}}
{{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver | urlquery }}{{ end }}
{{ define "telegram.default.message" }}
{{- range .Alerts.Firing }}
{{ if eq .Labels.severity "warning" }}🚩{{ else if eq .Labels.severity "critical" }}🆘{{ else }}❔️{{ end }}
<b>{{ .Labels.service }}</b> at {{ .Labels.instance }}
{{ .Annotations.title }}{{ if .Annotations.description }}
<i>({{ .Annotations.description | safeHtml }})</i>{{ end }}
{{- end }}
{{- range .Alerts.Resolved }}
🆗 {{ .Annotations.summary }}{{ if .Annotations.description }} <i>({{ .Annotations.description | safeHtml }})</i>{{ end }}
{{- end }}
{{ end }}
Alert for error messages with particular message text. Looks like need to debug deduplication at alertmanager.
- alert: "log_warning"
expr: "count_over_time({level=\"ERROR\"} | pattern `<message>` [1h])"
for: "1m"
annotations:
description: "{{ $labels.message }}"
title: "ERROR in logs for {{ $labels.service_name }}"
labels:
severity: "critical"
instance: "PROD"
service: "{{ $labels.service_name }}"
description: "{{ $labels.message }}"
translates into:
Description: <no value>
for a server version of grafana (v9.2.5)
Can we send error loglines as part of an alert? Any progress or workaround?
Can we send error loglines as part of an alert? Any progress or workaround?
@himao , You can use pattern
parser expression
extract error log line as label , so we can get the log lines from the alert. @RainM has given us an example config https://github.com/grafana/loki/issues/5844#issuecomment-1346436205
@rkonfj, the method described doesn't work. Here is my alert rule:
That's how it looks like in email:
What am I missing? Thanks.
@rayrapetyan , Looks like you're using grafana alert
, configure no data and error handling
to avoid DatasourceNoData
alert
I am using Loki alert
and working very well.
Hm, I see. Few times I was able to get valid text messages in email, but then destroyed the alert item and recreated it from scratch and not able to get any text anymore :(
On Sat, Feb 11, 2023, 7:25 PM rkonfj @.***> wrote:
I am using Loki alert and working very well.
— Reply to this email directly, view it on GitHub https://github.com/grafana/loki/issues/5844#issuecomment-1426932926, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABD3S2ZOFNECOTEQ6BSDA53WXBJ37ANCNFSM5S52XCBQ . You are receiving this because you were mentioned.Message ID: @.***>
I am using
Loki alert
and working very well.
Is it possible to provide example to parse json log line message in a message body?
That's how I was able to finally configure alerts from UI:
Is it possible to provide example to parse json log line message in a message body?
Sure, @sureshgoli25
For example, the originl log lines have common label app=order
{"level": "warn", "user": "xiaohua", "event": "access_secret", "time":"2023-03-07 08:30:00"}
{"level": "info", "user": "xiaohua", "event": "access_normal_page1", "time":"2023-03-07 08:31:00"}
{"level": "info", "user": "xiaohua", "event": "access_normal_page2", "time":"2023-03-07 08:32:00"}
{"level": "warn", "user": "xiaohua", "event": "access_secret", "time":"2023-03-07 08:33:00"}
We can use this metrics query
to generate alert when the warn
log lines appear.
count_over_time({app="order"} | json | level="warn" [5m])
In this example, this will generate 2
alert, and every alert have lables level
, user
, event
, time
and static label app
.
My solution above doesn't work anymore after upgrading grafana to v10.0.0 :(. Fortunately, it can be easily resolved by replacing with:
{{ $values.B0.Labels.message }}
Working on this query for an alert:
count_over_time({job="ad_basic"} |= `4688` | json host="hostname", severity="severity", body="body", uuid="uuid", timestamp="ISO_8601_ts" [1s])
Query is returning multiple results and I'm trying to just pull label/value pairs only from the last result of those evaluated.
example annotation ( this format is working for all the json expressions I've defined above but sends multiple values with each label):
{{ range $v := $values }}
{{ $v.Labels.host }}
{{ end }}
Is it possible to index results in the annotation template?
In case someone needs help with message preview using promtail + loki + alertmanager (all deployed using helm charts). I'm parsing log4j logs output. Using | pattern `<message>`
I'm extracting the field message
from log line. This is the actual log which is then used for sum by (message)
function which will add label message
. I'm adding the annotation message and assigning the sliced label value to it. This annotations is later sent to alertmanager and used for slack message.
Note that we're passing the label message
to alertmanager by default. This can (usually is) be quite a long line and parsing this label can be cumbersome.
groups:
- name: prod_error_logs
rules:
- alert: ProdErrorLogs
expr: |
sum(rate({namespace="prod"} |= `ERROR` | pattern `<message>` [5m])) by (message) > 0
for: 0m
labels:
severity: critical
namespace: prod
category: logs
annotations:
summary: "There are Error logs on PROD"
sourceLink: "<source_link>"
message: {{`"{{ slice $labels.message 0 500 }}"`}}
slack_configs:
- channel: "<CHANNEL_ID>"
send_resolved: true
title: |-
[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}]
PROJECT_NAME: {{ range .Alerts }}*{{ .Annotations.summary }}*
{{ end }}
text: |-
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
{{- if eq .Labels.category "logs" }}
*Source:* <{{ .Annotations.sourceLink }}|View in Grafana>
*Log:*
{{ .Annotations.message }}
```
{{- else }}
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{- end }}
{{ end }}
{{ end }}
For Grafana 11, at least in the cloud version, there have been more changes. I recommend avoiding the Classic condition. If you do that, the default notification template spits out all labels. And if you include
| pattern <message>
(note that <message>
itself also needs to be surrounded by backticks, but markup here messes that up)
at the end of your query you will see the log line within the label message
.
Can you tell me the minimum version that supports the use of label in alert rules?
Is your feature request related to a problem? Please describe. We need to retrieve contents off of the matching alert rules. Currently alert rules take in metric LogQL queries and report either true or false. If the query is true, it notifies alertmanager and all is good.
However, if we wanted to send the contents of the logs that match the query, it would almost need to be a separate field to choose the exact logs that would be sent as part of the notification message.
Describe the solution you'd like
Solution 1: Infer the log query from the metric query
This is better because the user won't have to change any existing alert rules and can simply update the alert message render template. A new
{{ $log := range .Logs }}
is magically available.Solution 2: Allow user to add their own custom log query in addition to metric query
If it's hard to derive log message from a metric query, Loki ruler could take two queries.
{{ $log := range .LogMessages }}
Describe alternatives you've considered The only alternative was to build my own ruler? Or use some other solution like Graylog that already supports this feature.