louislam / uptime-kuma

A fancy self-hosted monitoring tool
https://uptime.kuma.pet
MIT License
55.44k stars 4.99k forks source link

ping reliable service to prevent false DOWN on internet connection loss #774

Open jco-c opened 2 years ago

jco-c commented 2 years ago

Is it a duplicated question? Please search in Issues without filters: https://github.com/louislam/uptime-kuma/issues?q=

Haven't found any duplicate. (I'm not ceratin on the terminology regarding this problem though, so please correct me if I'm wrong).

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

It sometimes happens, that my internet connection goes down. In that case, Uptime Kuma reports all of my services as down, even though only my internet connection is down. This leads to a lot of messages and incorrect uptime stats in the dashboard.

Describe the solution you'd like A clear and concise description of what you want to happen.

I'd love some kind of 'health check' (not sure if that's the right term) in the settings, in which I ping a reliable service (e.g. Google or Cloudflare DNS). If that reliable service is down, Uptime Kuma would assume that my internet connection broke instead of all my services. It could then send only one message instead of tens/hundreds. It also could omit the data of the services from the logs or mark them differently, to prevent incorrect uptime numbers.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Instead of, or additionally, you could make an exception, if more than x services go down at the same time, or all of the services go down at once. This could create problems though when shutting down a server with multiple services, or when only monitoring very few services (would potentially lead to false positives).

srgvg commented 2 years ago

Basically a dependency between services. If you were to ping a web server, ans also tdo a tcp check on 80 and a http check on the same port, the latter should not fail explicitly if the ping already fails (assuming icmp is not blocked of course, perhaps a bad example, but tcp/80 and http by themselves are a better one)

deefdragon commented 2 years ago

This sounds similar to #216, but in this case, instead of a "core load balancer", you watch some public address.

louislam commented 2 years ago

This sounds similar to #216, but in this case, instead of a "core load balancer", you watch some public address.

Not quite.

@jco-c

I had thought this before by using https://www.npmjs.com/package/is-online.

However, it may be a breaking change, since Uptime Kuma is no longer working in the internal environment.

jco-c commented 2 years ago

@louislam yeah that looks like exactly what I meant. Some of the services is-online checks might log ip-adresses though? I know ipify doesn't, but icanhazip.com only commits on not sharing logs and some people might have privacy concerns?

SpcCw commented 2 years ago

@louislam I think some kind of user-defined dependency approach is better. It would still work in internal environment and be more flexible.

Something like this: each monitor can be set to depend on other monitor. If that other monitor is down (or in warning) then current monitor stops checks and sets warning status (or maybe some other non-green status) until the dependency comes back up.

Then people would make dependencies however they want - for external connectivity (by polling something like 1.1.1.1), core services like gateways/balancers (when many services depend on one), service chains (ping host -> poll backend -> poll frontend), etc...

It is nice that "is-online" determines the status by polling more than one source though. So described above would go great with #324 :)

wassereimer86 commented 2 years ago

Would love to see a feature like this. My stats are garbage if i get downs for my services just because the internet connection was lost. 🙂

mpql commented 2 years ago

I had thought this before by using https://www.npmjs.com/package/is-online.

I think something like that would be ideal; I would rather not have it ping all my alerts when the monitor's connection goes down. ETA: Pinging another service feels more like a work-around by itself, though personally, if I can show we're online and show that at least one of two other reliable services on different networks were online, that would validate my results of my own checks.

However, it may be a breaking change, since Uptime Kuma is no longer working in the internal environment.

Unless I'm misunderstanding, would it be possible to make it optional, off by default to avoid this being a breaking change?

olinorwell commented 1 year ago

I came here to post the same request. I believe a simple solution might be to have an option to ping a user definable host before classifying anything as down. Here is how I would build this into the current system.

Each monitor has a boolean value to determine if it should check the global 'connectivity check' before recording a 'DOWN' value. This would be false by default and that value means the system works as it does currently.

If this boolean value is set to true for a particular monitor - then when a 'DOWN' is encountered, it should also look at the global 'connectivity host' value in the main settings, if this is set to anything other than empty, then the system pings that host (e.g. google.com would be an easy one). If that host is reachable then the system has connectivity and the original monitor does indeed get a 'DOWN' value.

If however 'google.com' or whatever host is entered into the global 'connectivity host' is not reachable - then the original monitor gets a 'NO DATA' status, which in the stats would be considered an 'UP' for stats, but shown in grey.

This solution requires an extra boolean / checkbox for each monitor, and one global host value in the main settings.

It requires at most one extra ping whenever a DOWN value is encountered. As DOWN's should be rare, this is not going to add much extra network traffic.

Currently Uptime Kuma is perfect but for this one failing. I have monitors that are extremely unlikely to ever be genuinely DOWN, but when they are it's a big deal. I need to ensure that the occasional loss of connectivity for the Uptime Kuma host doesn't lead to 'DOWN' values being recorded (e.g. when I reset the router for the network it's on).

EthraZa commented 1 year ago

Came to post the same thing too. When I was young I have created the PHPmon to do the same thing, I remember to implement a check for internet connectivity before check hosts. It's just necessary, after all, no connectivity, no nothing.

CommanderStorm commented 9 months ago

@jco-c We are consolidating duplicate issues a bit to make issue management easier. I think, we should track this issue in #1089 as this request can be added via the path layed out in said issue ⇒ I am going to close this as a duplicate.

mpql commented 9 months ago

@jco-c We are consolidating duplicate issues a bit to make issue management easier. I think, we should track this issue in #1089 as this request can be added via the path layed out in said issue ⇒ I am going to close this as a duplicate.

@CommanderStorm Issue grooming is a huge pain, and I feel for you, but I don't think this is a duplicate issue. There should be a reliable method for ensuring connectivity before declaring a service down. The process outlined in that thread would be valuable for grouping, as outlined, and may function as a work-around for connectivity-checking, but robust-connectivity checking is a more desirable -- and decidedly separate -- solution. e.g.:

Can the user contact any of:

This would tell a user if they have connectivity, and if not, where the breakdown is. The work-around as outlined in that thread would:

a) not prioritize connectivity-checking by default, meaning the application would be inaccurate by default b) specify a single point of failure for connectivity-checking -- if something important goes down (e.g. some widely-used CDN), but the user's services don't, they'll still get downtime notices, which would thus be inaccurate.

Recommend re-opening this issue and treating it as a separate feature.

CommanderStorm commented 9 months ago

@chakflying I need a second opinion. I think this can be implemented via #1089 by adding a monitor which is deemed "reliable". => No need to add a second, mechanism achiving the same goal

mpql commented 9 months ago

I think this can be implemented via #1089 by adding a monitor which is deemed "reliable".

Even reliable monitors go down -- there have been both Cloudflare and Fastly outages in recent memory, the latter of which caused something like 40% of the internet to go down -- and my sites and services were still online.

If the #1089 proposal can be tweaked to allow a "parent service" of sorts with multiple monitors that itself goes on to then have multiple child monitors, then I think that'd be workable, e.g. for monitoring sites www1.example.com and www2.example.com:

...where if ANY service for the parent is up, we are considered to have connectivity and do not check the other parent services, and then check child services as normal.

If all services in the parent group are down, we do NOT have connectivity, and should not assume we have data about downtime for any child services, and should instead list the monitoring service as being down. We could show the monitor as its own service (e.g. status.example.com), showing downtime for that, and show blank / empty for existing child services.

This allows for actual connectivity checking as a first-class citizen, with both self-monitoring and arbitrary service monitoring. As currently proposed, #1089 does not meet connectivity-checking needs, and indeed the most recent comment seems to declare that connectivity checking is NOT what is desired in that ticket, but something more along the lines of "if hostname.example.com is down, assume everything matching *.example.com is also down" -- which is insufficient for connectivity verification.

chakflying commented 9 months ago

I think considering this as a separate feature would be more convenient for users. But of course it would be pretty complicated since there are many different use cases.

Rod-Gomes commented 7 months ago

I've come to lend my support to this idea; it's an extremely necessary feature that prompted me to temporarily stop using Uptime Kuma. I use it in my local network, which experiences frequent fluctuations, and this completely disrupts the uptime report.

Karewan commented 7 months ago

I also stopped using UptimeKuma for this reason. I used it on a cloud server, the uptime is not 100% (like many cloud providers), so every x days all my 50 monitors are reported as down, and 100 notifications are sent. ..

Rod-Gomes commented 7 months ago

Folks, I've created the simple Bash script below that has addressed this issue for me. Whenever my internet goes offline, I shut down the Docker container running Uptime Kuma. This way, my statistics remain organized when the internet at home experiences disruptions.

#!/bin/bash

# Get the Docker container ID containing 'uptimekuma' in the name
container_id=$(docker ps -aq --filter "name=uptimekuma")

# Check if the internet is active by sending 3 ping packets to 8.8.8.8
ping -c 3 8.8.8.8 > /dev/null 2>&1
if [ $? -eq 0 ]; then
    echo "Internet is active."

    # Check if the container is active
    if [ -n "$container_id" ] && [ "$(docker inspect -f '{{.State.Running}}' $container_id 2>/dev/null)" == "true" ]; then
        echo "Container is active, doing nothing."
    else
        echo "Container uptimekuma is not active, starting..."
        docker start $container_id
    fi
else
    echo "Internet is offline."

    # Check if the container is active
    if [ -n "$container_id" ] && [ "$(docker inspect -f '{{.State.Running}}' $container_id 2>/dev/null)" == "true" ]; then
        echo "Container is active, shutting down..."
        docker stop $container_id
    else
        echo "Container uptimekuma is not active, doing nothing."
    fi
fi

To use, create a file named check_internet.sh in your preferred location, copy and paste the contents of the Bash script into this file. Grant execution permissions to the file using chmod +x check_internet.sh.

Create a new crontab by using crontab -e and insert the following cron job (modify the directory where your script is located):

* * * * * sleep 20 && /bin/bash /<EDIT HERE>/check_internet.sh

This way, the script will be executed every 20 seconds. It worked well in my tests, but you may need to adjust it according to your specific requirements.

apio-sys commented 6 months ago

Came to post the same thing too. When I was young I have created the PHPmon to do the same thing, I remember to implement a check for internet connectivity before check hosts. It's just necessary, after all, no connectivity, no nothing.

I remember PHPMon very well (if we are talking about the same as here: https://github.com/phpservermon/phpservermon) which worked pretty nice at the time and I had it in production for a couple of sites/networks for many years and discontinued the last one recently to swap to U.K. which is far better. And sorry to say, but PHPmon didn't handle this very well (not to say at all). When I had Pushover as notifications it used to go ballistic for the sake of it's own issues. Which brings me why I wanted to reply here, since the type of notification you all seem looking for that says "I'm down, help me please!", how would you reckon that get's send if the U.K. machine is somehow disconnected from the Internet?

apio-sys commented 6 months ago

Folks, I've created the simple Bash script below that has addressed this issue for me. Whenever my internet goes offline, I shut down the Docker container running Uptime Kuma. This way, my statistics remain organized when the internet at home experiences disruptions.

I like this answer from @Rod-Gomes since it addresses a supposed issue which it really isn't from start, with a system orientated workaround/solution. Of course you should solve stuff like this in that manner and not asking apps to do stuff they are not supposed to do.

If I look back to the initial question "(...)It could then send only one message instead of tens/hundreds. It also could omit the data of the services from the logs or mark them differently, to prevent incorrect uptime numbers.(...)" my first thought is "how can it send a message since it is disconnected itself? The 2nd point of preventing incorrect uptime stats but that is not only adressed by the software itself but merely by the environment it is evaluating in.

mpql commented 6 months ago

It has incorrect uptime stats, and reports it once it comes back online, as well as the "recovery" even if the site(s) in question never actually went down.

In short, the monitor going down should show differently, and try to validate results -- the test isn't a failure, it's inconclusive. Connectivity checking is a pretty standard idea here.

UltimateByte commented 5 months ago

Glad to see others already had the idea for this feature. Here is my ideas on how to implement this easily.

Main principle

Add masters checks with conditions and do not run any other check if conditions fail.

Uptime Kuma's core checks functions can be tweaked so that if master check is enabled, it has to respect its conditions before processing any other check.

That solves the problem of this issue.

Details of master check

The feature allows selecting existing monitors (single or groups) checks, combined with different OR, AND, conditions and desired status "Up" or "Down").

For example:

Where to add the new option

Section can be added to "Settings" menu.

How the menu looks like

Master Monitors

Description: Configure monitor rules that must be true in order for any other check to run. Prevents false positive alert if your monitoring system has connectivity issues.

Pros of such implementation

This wouldn't require changing existing monitorings, hence wouldn't cause any breaking change, and would be off if unset, so users would have to manually configure it and turn it on.

Cons of such implementation

I don't see any, let me know.

mjkent commented 3 months ago

Really would love to see this implemented! When my internet goes down I get so many alerts.

GitBaer commented 1 month ago

Is there any update on this? Uptime Kuma can no longer be used by us in its current state. As already described by other users: if there is an outage of our host's internet connection, notifications for all monitors are sent, and the statistics are distorted.

UltimateByte commented 1 month ago

@GitBaer For now I've abandoned the idea of monitoring from the office. I'm using a little VPS. It's far slower showing the interface but still gets the job done. That is probably the way to go if you have many internet outages. The good part of it is I'm now notified on time when office loses internet, not afterwards :)