Our HealthController currently has a timeout for the Mongo pingCheck, but not the other checks, which makes the health check not very useful for debugging: if it times out at the front-end web proxy level, we don't get a traceback telling us which application service actually failed or timed out.
(This happened this morning.)
All the checks should havewith timeouts, and this fix should be verified with a test for the time-out case: this is critical for the health check to actually be useful for debugging infrastructure failures.
Our
HealthController
currently has a timeout for the MongopingCheck
, but not the other checks, which makes the health check not very useful for debugging: if it times out at the front-end web proxy level, we don't get a traceback telling us which application service actually failed or timed out.(This happened this morning.)
All the checks should havewith timeouts, and this fix should be verified with a test for the time-out case: this is critical for the health check to actually be useful for debugging infrastructure failures.