grafana / flagger-k6-webhook

Using k6 to do load testing of the canary before rolling out traffic
Apache License 2.0
24 stars 9 forks source link

Handle spawned child processes #152

Closed Croxed closed 2 months ago

Croxed commented 3 months ago

Hey

We've encountered some issues running the flagger-k6-webhook deployment in a Kubernetes cluster. After some time, the deployment has a lot of zombie processes, causing new forks/test executions to fail. We've currently mitigated the issue by restarting the deployment, but it's not a long term solution for us. We are also using the latest version of this project.

Some suggestions, after looking at other projects, is to use tini or dumb-init as an entrypoint, in order to reap the zombie processes. Here's a ps output from the container running flagger-k6-webhook, after it has been running for 6 hours. bild

zerok commented 3 months ago

Hi :) My guess is that we are launching k6 subprocesses but in at least some cases not waiting for them. I'll try to come up with a test-case.

Croxed commented 3 months ago

Hey! Thanks for responding πŸ˜„

Forgot to add some more context. We almost exclusively use "wait_for_results=false", since it suits our needs best.

zerok commented 3 months ago

Ah, that would explain it since this is the one non-error case where we don't wait the process. I guess we need to wait for them on a global level. I probably can no longer work on this today but perhaps I'll find some time next week :)

zerok commented 2 months ago

@Croxed I have a PR ready that should solve this issue. Do you perhaps have some time to give it a try? πŸ™‚

https://github.com/grafana/flagger-k6-webhook/pull/153

Croxed commented 2 months ago

Hey @zerok! Sorry for the late reply. We've tried it out and it seems like it works! No zombie processes anywhere.

Thanks for solving our issueπŸ™‚

zerok commented 2 months ago

No worries and thanks for testing πŸ˜„