purescript / registry-dev

Development work related to the PureScript Registry
https://github.com/purescript/registry
95 stars 80 forks source link

Use withRetry and explicit failure limit in healthchecks #679

Closed thomashoneyman closed 9 months ago

thomashoneyman commented 9 months ago

Our healthchecks ping goes down from time to time, giving the false impression that the server itself has gone down. In part this is because of a poor implementation of the ping where any failure immediately kills the healthcheck and doesn't try again; in this PR we actually do begin to retry on failure up to a limit (10, in this case). It also adds a little logging.

The reason I think the problem is just an occasional dropped request is because we also have Uptime Robot pinging the server at the /jobs endpoint regularly, and that one never seems to fail. If our server was truly dropping out then I'd expect both healthchecks to go down.