Closed unfeeling91 closed 1 year ago
@unfeeling91 Could you provide more details? Not a single group was generated from there alerts?
@unfeeling91 Could you provide more details? Not a single group was generated from there alerts?
Hi, the alerts in multiple groups, deployed via alertmanager section kube-prometheus-stack in k8s
additionalPrometheusRules:
- name: rules
groups:
- name: meta
rules:
- alert: heartbeat
expr: vector(1)
labels:
severity: none
annotations:
description: This is heartbeat alert
summary: Alerting Amixr
- name: kubernetes.rules
rules:
- alert: KubePodCrashLooping
expr: |
max_over_time(kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff", job="kube-state-metrics"}[5m]) >= 1
annotations:
description: 'Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container
}}) is in waiting state (reason: "CrashLoopBackOff").'
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodcrashlooping
summary: Pod is crash looping.
for: 15m
labels:
severity: warning
@unfeeling91 Thanks. What's going on on the OnCall side? Not a single alert group was created in OnCall? (I'm asking to make sure that alerts are not grouped on the OnCall side). Also, if you are using cloud, send link to your integration please.
@unfeeling91 Thanks. What's going on on the OnCall side? Not a single alert group was created in OnCall? (I'm asking to make sure that alerts are not grouped on the OnCall side). Also, if you are using cloud, send link to your integration please.
I am not using the cloud version, on oncall not see something strange in logs 2022-11-24 05:59:54 source=engine:app google_trace_id=none logger=root inbound latency=0.226657 status=200 method=POST path=/integrations/v1/alertmanager
but nothing is come to groups in UI
resolved - it is alert that I send via UI pressing button send test alert
@unfeeling91 you have your integration token in path? I mean in logs it looks like path=/integrations/v1/alertmanager/<secret_integration_token , right? If so, could you please send celery container logs.
@Konstantinov-Innokentii correct, token in place, in celery I see such info, no any errors
2022-11-24 07:30:50,364 source=engine:celery task_id=a86c0f90-e88d-43b7-b656-79c6126440fa task_name=apps.schedules.tasks.refresh_ical_files.start_refresh_ical_files name=celery.app.trace level=INFO Task apps.schedules.tasks.refresh_ical_files.start_refresh_ical_files[a86c0f90-e88d-43b7-b656-79c6126440fa] succeeded in 0.01594165700225858s: None 2022-11-24 07:30:50,364 source=engine:celery task_id=??? task_name=??? name=celery.worker.strategy level=INFO Task apps.schedules.tasks.refresh_ical_files.refresh_ical_file[2f07c05c-058d-4e25-8b2d-c130714c00f8] received 2022-11-24 07:30:50,365 source=engine:celery task_id=??? task_name=??? name=celery.worker.strategy level=INFO Task apps.slack.tasks.start_update_slack_user_group_for_schedules[396bbc9f-302d-4311-8c9b-701d3e52c8a3] received 2022-11-24 07:30:50,377 source=engine:celery task_id=1f123c78-63ab-4581-8a71-64fbf5af9504 task_name=apps.heartbeat.tasks.restore_heartbeat_tasks name=celery.app.trace level=INFO Task apps.heartbeat.tasks.restore_heartbeat_tasks[1f123c78-63ab-4581-8a71-64fbf5af9504] succeeded in 0.009998007000831421s: None 2022-11-24 07:30:50,379 source=engine:celery task_id=2f07c05c-058d-4e25-8b2d-c130714c00f8 task_name=apps.schedules.tasks.refresh_ical_files.refresh_ical_file name=apps.schedules.tasks.refresh_ical_files level=INFO Refresh ical files for schedule 2 2022-11-24 07:30:50,434 source=engine:celery task_id=2f07c05c-058d-4e25-8b2d-c130714c00f8 task_name=apps.schedules.tasks.refresh_ical_files.refresh_ical_file name=apps.schedules.tasks.refresh_ical_files level=INFO run_task_primary 2 False icals not equal 2022-11-24 07:30:50,435 source=engine:celery task_id=2f07c05c-058d-4e25-8b2d-c130714c00f8 task_name=apps.schedules.tasks.refresh_ical_files.refresh_ical_file name=celery.app.trace level=INFO Task apps.schedules.tasks.refresh_ical_files.refresh_ical_file[2f07c05c-058d-4e25-8b2d-c130714c00f8] succeeded in 0.05589042399878963s: None 2022-11-24 07:30:54,550 source=engine:celery task_id=??? task_name=??? name=celery.worker.strategy level=INFO Task apps.heartbeat.tasks.process_heartbeat_task[1b877165-b60d-4919-a38d-e35fdb87525e] received 2022-11-24 07:30:54,563 source=engine:celery task_id=1b877165-b60d-4919-a38d-e35fdb87525e task_name=apps.heartbeat.tasks.process_heartbeat_task name=apps.heartbeat.tasks level=INFO IntegrationHeartBeat selected for alert_receive_channel 8 in 0.01161727399812662 2022-11-24 07:30:54,564 source=engine:celery task_id=1b877165-b60d-4919-a38d-e35fdb87525e task_name=apps.heartbeat.tasks.process_heartbeat_task name=apps.heartbeat.tasks level=INFO heartbeat_checkup task started for alert_receive_channel 8 in 0.013139029997546459 2022-11-24 07:30:54,564 source=engine:celery task_id=1b877165-b60d-4919-a38d-e35fdb87525e task_name=apps.heartbeat.tasks.process_heartbeat_task name=apps.heartbeat.tasks level=INFO state checked for alert_receive_channel 8 in 0.013271511998027563 2022-11-24 07:30:54,566 source=engine:celery task_id=??? task_name=??? name=celery.worker.strategy level=INFO Task apps.heartbeat.tasks.integration_heartbeat_checkup[83ffd34b-4c2b-4aa3-bcff-68ccefde757e] received 2022-11-24 07:30:54,569 source=engine:celery task_id=1b877165-b60d-4919-a38d-e35fdb87525e task_name=apps.heartbeat.tasks.process_heartbeat_task name=celery.app.trace level=INFO Task apps.heartbeat.tasks.process_heartbeat_task[1b877165-b60d-4919-a38d-e35fdb87525e] succeeded in 0.01825634499982698s: None 2022-11-24 07:30:57,236 source=engine:celery task_id=6163de36-e9e0-4eea-b170-cd425b453f7e task_name=apps.heartbeat.tasks.integration_heartbeat_checkup name=apps.heartbeat.models level=INFO Heartbeat 7 is not actual 6163de36-e9e0-4eea-b170-cd425b453f7e 2022-11-24 07:30:57,239 source=engine:celery task_id=6163de36-e9e0-4eea-b170-cd425b453f7e task_name=apps.heartbeat.tasks.integration_heartbeat_checkup name=celery.app.trace level=INFO Task apps.heartbeat.tasks.integration_heartbeat_checkup[6163de36-e9e0-4eea-b170-cd425b453f7e] succeeded in 0.014365891001943965s: None
@unfeeling91 do you see create_alertmanager_alerts tasks in logs, when you are receiving alerts?
create_alertmanager_alerts
kubectl logs oncall-celery-6d678c8bf7-jhn6w | grep create_alertmanager_alerts
no output
@unfeeling91 what you can do to further debug the problem:
@unfeeling91 And could you please share example of payload, which AM sends to the OnCall?
@unfeeling91 And could you please share example of payload, which AM sends to the OnCall?
How to see this payload in alertmanager? For now, I see level=debug component=dispatcher receiver=grafana_oncall integration=webhook[0] msg="Notify success" attempts=1 ts=2022-11-22T06:12:11.264Z caller=notify.go:743 level=debug component=dispatcher receiver=grafana_oncall
@unfeeling91 You can use https://webhook.site/, sent alert to this site and check payload there.
@Konstantinov-Innokentii curl via webhook work, trying to debug alertmanager payload right now.
The issue was fixed, the problem was with the ingress object, I expose on-call to separate ingress and it start working like a charm. Really appreciate @Konstantinov-Innokentii for your effort and support, issues could be closed, thanks!
@unfeeling91 Thanks! Eager to hear more feedback/issues from you!
There is some problem:
I configure alert manager integration, and in the logs I see:
ts=2022-11-22T06:12:07.052Z caller=notify.go:743 level=debug component=dispatcher receiver=grafana_oncall integration=webhook[0] msg="Notify success" attempts=1 ts=2022-11-22T06:12:11.264Z caller=notify.go:743 level=debug component=dispatcher receiver=grafana_oncall integration=webhook[0] msg="Notify success" attempts=1 ts=2022-11-22T06:12:11.264Z caller=notify.go:743 level=debug component=dispatcher receiver=grafana_oncall integration=webhook[0] msg="Notify success" attempts=1 ts=2022-11-22T06:12:11.330Z caller=notify.go:743 level=debug component=dispatcher receiver=grafana_oncall integration=webhook[0] msg="Notify success" attempts=1 ts=2022-11-22T06:12:23.351Z caller=notify.go:743 level=debug component=dispatcher receiver=grafana_oncall integration=webhook[0] msg="Notify success" attempts=1
Some alerts in firing state.
But in not coming to alert groups on oncall plugin page, what could be the reason for it?