matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.79k stars 2.13k forks source link

Make healthcheck check a little bit more #11473

Open ShadowJonathan opened 2 years ago

ShadowJonathan commented 2 years ago

Currently /health looks like this;

    def render_GET(self, request: Request) -> bytes:
        request.setHeader(b"Content-Type", b"text/plain")
        return b"OK"

Which is functionally equivalent to calling /versions on the endpoint.

I think that this should do a little more than just blindly respond with 'everything is fine', giving me a similar feeling to the following meme;

image


Jokes aside, i think that this endpoint should perform or otherwise "check up" on some basic functionality, or otherwise return "not OK" (with 5XX) when some precondition isn't present (which could be defined from other resources).

Maybe this could be linked to an "error counter", which would count the last amount of exceptions in the last minute, and this health resource should then return "not OK" if it passes a threshold.

Other than that, this is open for further ideas.

philipcristiano commented 10 months ago

+1 for this. I've noticed my client reports that it cannot connect to Synapse but the healthcheck continues to 200 OK. This typically happens in my case after a Postgres restart. Restarting Synapse fixes the problem but it would be nice if the healthcheck appropriately reported the health of the process so infra automation can remediate.