Gets rid of segfaults and leaked child processes when running in streaming mode.
Previously the OpenFAAS watchdog implemented custom logic based on a timer to kill function processes that exceed some exec_timeout.
This solution was susceptible to panics and resource leaks:
SIGSEGV when attempting to invoke Kill() on an invalid *Cmd pointer
OOMKills in Kubernetes due to what appears to be the watchdog failing to reap child processes before executing new function call invocations.
With this change the entire timer based solution is discarded in favour of standard CommandContext API from the exec package.
Fixes: #138
Motivation and Context
See: #138
[x] I have raised an issue to propose this change (required)
How Has This Been Tested?
Built a container image for the customised OpenFAAS watchdog
Based on reverting some of the github actions work to recover an older version of the Dockerfile with convenient build commands.
Updated my own function container image to use the customised OpenFAAS watchdog image as base image
Deployed a Function resource using the customised function image with Helm.
Repeated load tests (see #138 ), while monitoring:
Pod restarts: watch kubectl get pods -l faas_function=...
Resource consumption: watch kubectl top pod -l faas_function=...
Observed no pod restarts (no more crashes) and steady memory consumption (no more 'step' increases due to leaked memory/child processes)
Types of changes
[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
[x] My code follows the code style of this project.
[ ] My change requires a change to the documentation.
Description
Gets rid of segfaults and leaked child processes when running in streaming mode.
Previously the OpenFAAS watchdog implemented custom logic based on a timer to kill function processes that exceed some
exec_timeout
. This solution was susceptible to panics and resource leaks:Kill()
on an invalid*Cmd
pointerWith this change the entire timer based solution is discarded in favour of standard
CommandContext
API from theexec
package.Fixes: #138
Motivation and Context
See: #138
How Has This Been Tested?
Function
resource using the customised function image with Helm.watch kubectl get pods -l faas_function=...
watch kubectl top pod -l faas_function=...
Types of changes
Checklist:
git commit -s