Closed dvanderveer closed 9 months ago
I would support a pull request that added curl to the docker image. It should be sufficient to rely on the status code of the response. If you're not seeing a 500 status with the "unable to retrieve site" error then that would be a bug I'd like to see fixed. If you have any evidence of that happening, please share.
As for the healthcheck, I would recommend using your docker-compose.yml to configure it. This way it's an opt-in extra call to the api and also easier to configure the target url for those running in multi-instance mode.
I've confirmed that the site does indeed return error 500 in the unreachable Lemmy site scenario. I must have mixed up mlmym test results with one of the other alt UI containers I was working on. Sorry for the bunk PR! I'll close this one and open a new PR that adds curl to the container for healthcheck purposes.
Problem
Lemmy.world runs multiple alternate UIs, including mlmym. We recently had an outage for alt UIs due to a misconfiguration on our server, which went uncaught longer than we'd like. While we do have monitoring and alerting for alt UIs, a Docker health check would provide more immediate feedback to admins during site maintenance.
Implementation Notes
A basic health check using
curl
orwget
doesn't seem appropriate for a couple reasons:curl
orwget
Instead, this PR uses golang's html.Parse to parse the index page.
Solution
Add a basic health checker to the container image. The health checker returns exit code 1 in any of the following scenarios:
div
tag of classerror
enclosing text containing the string "unable to retrieve site"go.mod
andgo.sum
were updated bygo mod tidy
.Testing Done
Successfully built the docker container image, then launched containers for a nonexistent Lemmy site and a known-good Lemmy site. Confirmed that the health checks reported correctly for each container:
Also confirmed that both containers loaded as expected in a browser, with the "good" container showing lemmy.world and the "bad" container showing the stub UI with "unable to retrieve site" in red text.