sourcegraph / checkup

Distributed, lock-free, self-hosted health checks and status pages
https://sourcegraph.github.io/checkup
MIT License
3.42k stars 248 forks source link

Allow optionally specifying a range of acceptable response codes #142

Open therealkevinard opened 4 years ago

therealkevinard commented 4 years ago

It's quite common for a 301 or 302 response code to be considered "up". Similarly, someone asserting the a resource is permanently away would "pass" for anything above 400. And testing for public access should look for 401-403.

To allow these cases, we should be able to specify a range of acceptable status codes in our config.

Something along the lines of:

{
    "type": "http",
    "endpoint_name": "Site Dot Com",
    "endpoint_url": "https://site.com",
    "up_status_range": {
        "min": 200,
        "max": 399
    }
}
titpetric commented 4 years ago

I’m not convinced of the use case. At some point if you’re largly ignoring the http response, you might as well just use a TCP checker. If you don’t have explicit control to check against a single response code status, the test is too broad. I even added support to accept 200-204 reluctantly. If anything, an array of status codes is preferable to a min/max range.

therealkevinard commented 4 years ago

The PR allowed a good bit of flexibility in the config, but I think my use case (the target) would be rather common:

Right now, I'm monitoring ~300 services. My dashboard shows critical, mostly because of nginx and varnish proxies returning 304 cache headers. Also, 301 and 302 are certainly up and need no action.

I'm evaluating checkup as a tool for a) crisis response and b) sla reporting/compliance. For the most part, it's AMAZING for that - except the rigidity in the acceptable codes.

I'm with you, really. Rather than code-in the map of acceptables, though, allowing ranges passes the decision on the the end user.

golddranks commented 3 years ago

Seconded. I'm monitoring a service that requires an authentication. I'm expecting to see the 401 status and nothing else – if I'm not seeing that, something's wrong. Some services such as UptimeRobot allow specifying the expected status codes in the monitoring settings, which further proves the point that this is a feature that people use.

golddranks commented 3 years ago

Oops, never mind, I apparently missed up_status even though I skimmed the settings. So for my usecase, the current functionality is enough.