Our healthchecks ping goes down from time to time, giving the false impression that the server itself has gone down. In part this is because of a poor implementation of the ping where any failure immediately kills the healthcheck and doesn't try again; in this PR we actually do begin to retry on failure up to a limit (10, in this case). It also adds a little logging.
The reason I think the problem is just an occasional dropped request is because we also have Uptime Robot pinging the server at the /jobs endpoint regularly, and that one never seems to fail. If our server was truly dropping out then I'd expect both healthchecks to go down.
Our healthchecks ping goes down from time to time, giving the false impression that the server itself has gone down. In part this is because of a poor implementation of the ping where any failure immediately kills the healthcheck and doesn't try again; in this PR we actually do begin to retry on failure up to a limit (10, in this case). It also adds a little logging.
The reason I think the problem is just an occasional dropped request is because we also have Uptime Robot pinging the server at the /jobs endpoint regularly, and that one never seems to fail. If our server was truly dropping out then I'd expect both healthchecks to go down.