Open ShadowJonathan opened 2 years ago
+1 for this. I've noticed my client reports that it cannot connect to Synapse but the healthcheck continues to 200 OK. This typically happens in my case after a Postgres restart. Restarting Synapse fixes the problem but it would be nice if the healthcheck appropriately reported the health of the process so infra automation can remediate.
Currently
/health
looks like this;Which is functionally equivalent to calling
/versions
on the endpoint.I think that this should do a little more than just blindly respond with 'everything is fine', giving me a similar feeling to the following meme;
Jokes aside, i think that this endpoint should perform or otherwise "check up" on some basic functionality, or otherwise return "not OK" (with 5XX) when some precondition isn't present (which could be defined from other resources).
Maybe this could be linked to an "error counter", which would count the last amount of exceptions in the last minute, and this health resource should then return "not OK" if it passes a threshold.
Other than that, this is open for further ideas.