Closed r3nor closed 1 year ago
Try using Prometheus function "absent()", for example:
ALERT nginx_absent
IF absent(container_cpu_usage_seconds_total{com_docker_compose_service="nginx"})
FOR 5s
LABELS {
severity="critical"
}
ANNOTATIONS {
SUMMARY= "Instance {{$labels.instance}} down",
DESCRIPTION= "Instance {{$labels.instance}}, Service/Job ={{$labels.job}} is down for more than 5 sec."
}
Maybe this will work for you.
or another option:
ALERT ContainerKilled
EXPR: time() - container_last_seen > 60
FOR: 0m
LABELS:
severity="critical"
ANNOTATIONS:
SUMMARY= "Container killed (instance {{ $labels.instance }})"
DESCRIPTION= "A container has disappeared\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
Some examples have been provided so I'm going to close this issue. If you're still having issues please explain what you have tried and we may be able to assist.
I have several servers running simmilar stacks (i.e: nginx, wordpress...). In every machine, the stacks have the same names. I am trying to set an alert that will fire if any container in any server is down at any moment:
Imagine machines A and B, both running a compose with nginx and wordpress. If nginx on machine A has a problem I want to be notified. I don't want to create an alert for each machine and each container as I have much more machines than 2. I am trying to set an alert that will fire if ANY container in ANY server is down. Preferably it would be great if I could extract the last() data so I can know which instance is down.
Is there any way to achieve this?