wellcomecollection / catalogue-api

:crystal_ball: The API for searching the Wellcome Collection catalogue.
https://developers.wellcomecollection.org
MIT License
4 stars 0 forks source link

Add option for HTTP healthcheck, enable for works #736

Closed kenoir closed 7 months ago

kenoir commented 7 months ago

What does this change?

This change adds the option to select a HTTP healthcheck for a services target group. The reason for this change is to avoid downtime during deploys, as at present the TCP healthcheck will only wait for nginx to be able to open a connection although the underlying scala service may not yet have started.

By default the HTTP healthcheck uses the /management/healthcheck endpoint (already present in the works service), and will register as healthy if that endpoint returns a 200 (see https://api.wellcomecollection.org/catalogue/v2/management/healthcheck).

[!NOTE] The intention is to roll this our for the catalogue-api works service to start with, and then add the necessary endpoints to the items and concepts service, finally removing the option for a TCP health-check. In addition further work is required to properly report the health of the service as being able to respond from the app container isn't definitive proof the app is "healthy".

How can we test?

How can we measure success?

No downtime during deployments resulting in a better experience for visitors to the site, and fewer errors that we cannot effectively respond to in the alerts channel.

Have we considered potential risks?

Changing the health-checks changes the failure modes for the API, we should test thoroughly in stage before deploying to prod, consider and document the impact of extending the health check to fail in other situations (e.g. elasticsearch is unavailable).