Function readiness check

Note: "fn" used to refer to OF "function", to distinguish from the general usage of the term.

As previously discussed over a number of community calls, and especially after watchdog readiness checks were implemented, fn readiness is also something we would like to be able know about, preferably without needing to re-implement the checks in the watchdog for each fn.

Expected Behaviour

The goal is for the watchdog to be able to respond to both its own readiness, such as concurrency limits, disk usage, and any other readiness checks that should be common to all fns. A fn should also be able to respond with its own application-specific readiness, such as whether or not a DB connection is established or the like, and that readiness should work in conjunction with the watchdog's readiness.

Current Behaviour

Right now, only the watchdog has readiness, and if you want to provide fn readiness, you must override with a custom endpoint, needing to re-implement the watchdog's existing readiness checks in your programming language of choice.

Possible Solution

A fn would then have a readiness check, likely a custom endpoint exposed by the fn via a small HTTP server. When the watchdog receives an HTTP request to its readiness endpoint, the watchdog would check if it is ready. If so, then the fn's readiness endpoint is called, and the watchdog simply returns the results of the fn readiness endpoint (possibly with an additional header for clarity as to which component is responding).

This would allow the watchdog to gracefully handle both its own readiness and the fn's readiness, in a way that doesn't require the fn's readiness to be implemented in a specific language, say if you provided a fn readiness function for the watchdog to import.

For prior art, this pattern of using internal HTTP to do a hierarchical check is very similar to how Istio's proxy sidecar works; Istio's sidecar replaces the pod's health check endpoint with its own, and then uses the pod's health check inside of the proxy agent itself.

Implementation detail: timeouts for this readiness endpoint would need to be configurable by the user. this could likely be the same timeout for the watchdog, which has a pattern already but is not yet exposed in fn config.

Steps to Reproduce (for bugs)

Context

Readiness is important for load balancing and ensuring high throughput with a low amount of retries. The more readiness can be used to guard against cases that would require a retry, the faster a workload can be moved through, and the less often sync clients will need to retry.

Your Environment

Docker version docker version (e.g. Docker 17.0.05 ): latest
Are you using Docker Swarm or Kubernetes (FaaS-netes)? faas-netes
Operating System and version (e.g. Linux, Windows, MacOS): linux & mac
Link to your project or a code example to reproduce issue:

openfaas / of-watchdog