Open Dean-Christian-Armada opened 6 years ago
Node exporter and cadvisor are running on each Swarm node, so you can configure an alert for up{job="node-exporter"}
I don't think it is effective enough. As the value 0 of that certain node-exporter will not be present for long. Also, it shows only the instance IP and not the node_name.. I tried grouping it with node_name but it will not show up at all please see photos below
Screenshot of up
with a down node-exporter
Screenshot of up
grouping it with node_meta
You can use IF absent(node_meta) FOR 5m
Hi @stefanprodan , what should be the expected value on the absent(node_meta)
query? The case is if there is even just a single node that went down. Specifically for my case, my "swarm-node-2" went down.
The photo below is what returned when I intentionally downed my swarm-node-2
@Dean-Christian-Armada , I am also facing the same problem. I want to create a rule whenever a node is down. Also if a container is down I should get alert for the same.
@abhisheks-cuelogic , "Container down", you mean if you have a python container that went down then it will alert? I don't think it's possible with the container part. Prometheus needs node-exporter or other scraping like tool to determine metrics. Unless, there is an agent that can be installed inside the container to determine if it went down.
Not the container itself should alert. Can we use something like :
ALERT piwik_nginx IF count(time() - container_last_seen{name=~"^piwik_nginx.*"} < 60) ANNOTATIONS { summary = "piwik_nginx container is down", description = "piwik_nginx is down for more tha 1 minute", }
I tried this rule, but somehow alert is always active even container is up.
@stefanprodan , we need your advise.
Have you ever tried creating a rule like if the node went down then it will throw an alert?