Add support for a ready endpoint, based upon max_inflight and change timers to use a Context with a deadline.

alexellis commented 1 year ago

Description

Motivation and Context

1) Ready endpoint introduced 2) Documentation updated for usage 3) Logic for timing and stopping requests that run for too long has been switched over to use a context.

1) Currently, when a function is overloaded with too many requests vs max_inflight i.e. 10 inflight with a limit of 2, then a 429 / retry message is returned to the caller.

Kubernetes keeps all the Pods that are overloaded in its ready pool, so a caller like the queue-worker may have to call many times to finally hit a pod with capacity.

By implementing a ready endpoint, the readiness check for an individual function can be implemented, so that Kubernetes removes it from the pool of endpoints when it exceeds its max_inflight. It will return again when it's once again deemed to be ready.

Tested with a large amount of requests through the queue- worker, pods were removed from the pool of endpoints as expected.

2) The documentation was reorganise and updated for new options and other were removed which were not relevant.

3) As per https://github.com/openfaas/of-watchdog/commit/47d5c31a5a182999ea3f904d187355023ad75955, there were other modes which had an issue that could be resolved by using a Context with a deadline instead of a timer.

How Has This Been Tested?

Tested with unit tests on the new change to the concurrency limiter to expose a Met() method used in the new Ready endpoint.

No harm to existing functions which won't use the /_/ready path.

Then tested with some load with the queue-worker. I observed Pods being removed from the readiness count and coming back online again as expected.

This testing revealed a bug in the scale from zero logic of the gateway which was fixed here: https://github.com/openfaas/faas/pull/1753

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[x] My code follows the code style of this project.
[x] My change requires a change to the documentation.
[x] I have updated the documentation accordingly.
[x] I've read the CONTRIBUTION guide
[x] I have signed-off my commits with git commit -s
[ ] I have added tests to cover my changes.
[x] All new and existing tests passed.

alexellis commented 1 year ago

Thanks for the review @LucasRoesler

kevin-lindsay-1 commented 1 year ago

LGTM

openfaas / of-watchdog