openfaas / of-watchdog

Reverse proxy for STDIO and HTTP microservices
MIT License
262 stars 115 forks source link

feat: allow proxying readiness checks to the function #145

Closed LucasRoesler closed 1 year ago

LucasRoesler commented 1 year ago

Allow setting an endpoint path for the function readiness check via an ENV variable function_ready_endpoint

When this value is set, the requests to /_/ready will execute an empty GET request with/to the configured endpoint. This allows the function authors to implement custom readiness check logic.

This custom request is checked after the the standard liveliness checks and the ConcurrencyLimiter check. For a completely custom readiness check, the function should be deployed with max_inflight == 0 and function_ready_endpoint to the custom path.

Motivation and Context

Resolved #144

How Has This Been Tested?

Additional unit tests have been added Manually tested with a custom python function image

Types of changes

Checklist:

kevin-lindsay-1 commented 1 year ago

I have tested this, and the functionality appears to work as expected.

I built a golang fn that had a readiness endpoint of /_/function/ready and simply serve it, with an init() function defined in my fn. Once the init fn completed, the readiness endpoint was made available, and began outputting 200s, and the function entered the ready state in k8s.

kevin-lindsay-1 commented 1 year ago

One note: the readiness endpoint seems to emit logs, which should probably be silent by default.

kevin-lindsay-1 commented 1 year ago

This is a priority to get merged for me, because if the gateway forwards traffic to a fn before it is initially ready, while the handler itself is not yet available in the fn to process the invocation, this would likely result in errors. This error case is similar to the Istio issue we previously solved for, where the gateway was sending traffic to the pod just a few milliseconds early.

LucasRoesler commented 1 year ago

@kevin-lindsay-1 do you mean the error log or the proxy log line? I would have to add a bit more code to make that log line disappear because I am reusing the function proxy implementation, it just looks like any other request at the moment?

The error log line should never happen, if you are seeing that, then i am very curious for more details.

LucasRoesler commented 1 year ago

On more thing, we already exclude the kube-probe from the logs

    // Exclude logging for health check probes from the kubelet which can spam
    // log collection systems.
    if !strings.HasPrefix(r.UserAgent(), "kube-probe") {
        log.Printf("%s %s - %s - ContentLength: %d", r.Method, r.RequestURI, res.Status, res.ContentLength)
    }

so ... it should be omitted if/when i pass the headers along

kevin-lindsay-1 commented 1 year ago

I'm pretty sure I was seeing the proxy log line. Looked like the one you in the code example above.

I'd be fine with a follow-up patch, if I can confirm the behavior. It didn't seem to do it after the initial start, so it might have been an "error" due to not passing initial readiness.

LucasRoesler commented 1 year ago

I'm pretty sure I was seeing the proxy log line. Looked like the one you in the code example above.

I'd be fine with a follow-up patch, if I can confirm the behavior. It didn't seem to do it after the initial start, so it might have been an "error" due to not passing initial readiness.

Pretty easy to pass the headers along, i just pushed a followup commit to copy them from the original request, this should now include the User-Agent and allow the proxy log to be suppressed (if the user agent contains "kube-probe")

alexellis commented 1 year ago

Hi @LucasRoesler I did some more testing on this and wanted to share the results:

go build && mode=serializing ready_path=/ready  upstream_url="http://127.0.0.1:8888" fprocess="env" ./of-watchdog 

curl localhost:8080/_/ready

Screenshot from 2022-10-14 10-28-03

alexellis commented 1 year ago

I have a fix for this, it wasn't a new issue in a new PR along with your commits (for you to get credit), but was possibly triggered by the changes.

Since you may want this PR to count towards hacktoberfest, feel free to open a PR to the README file to update how this feature works and I'll merge that after.