litmuschaos / litmus-go

Apache License 2.0
66 stars 118 forks source link

Fix the cgroup 2 process attaching problem #677

Open kbfu opened 9 months ago

kbfu commented 9 months ago

What this PR does / why we need it: Fix the problem when attaching the process to another cgroup when using cgroup 2.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes # Fixed this issue. https://github.com/litmuschaos/litmus/issues/3902

Special notes for your reviewer:

Checklist:

uditgaurav commented 8 months ago

Hello @kbfu, thank you for your contribution through the pull request.

I would like to inquire about the specific cluster environment and container runtime where you have conducted your tests. We encountered an issue when running it on a GKE cluster with containerd and cgroupv2. Here's the error we observed:

could not get cgroup manager --- at /litmus-go/chaoslib/litmus/stress-chaos/helper/stress-helper.go:134 (prepareStressChaos) --- Caused by: Error in getting groupPath,nsenter: unrecognized option: C BusyBox v1.35.0 (2022-08-01 15:14:44 UTC) multi-call binary. Usage: nsenter [OPTIONS] [PROG ARGS] -t PID Target process to get namespaces from -m[FILE] Enter mount namespace -u[FILE] Enter UTS namespace (hostname etc) -i[FILE] Enter System V IPC namespace -n[FILE] Enter network namespace -p[FILE] Enter pid namespace -U[FILE] Enter user namespace -S UID Set uid in entered namespace -G GID Set gid in entered namespace --preserve-credentials Don't touch uids or gids -r[DIR] Set root directory -w[DIR] Set working directory -F Don't fork before exec'ing PROG

kbfu commented 8 months ago

Hi @uditgaurav , I rebuilt the image and replaced the base image from alpine to debian. I believe nsenter command from busybox was outdated. This is the version I am using now. nsenter from util-linux 2.38.1

uditgaurav commented 8 months ago

@kbfu, Thanks for your response. I'm wondering if we can integrate this capability within the Alpine-based image itself, as this would help in maintaining a smaller image size.

The corresponding version of util-linux package in Alpine is 2.38-r1.

For your reference, the experimental Dockerfile is located here - litmus-go Dockerfile. It uses the base image litmuschaos/experiment-alpine, sourced from this Dockerfile. Perhaps we can consider adding the required functionality in this Dockerfile.

uditgaurav commented 8 months ago

Hi @kbfu, I've created a test experiment image using the same Alpine-based image, which includes package util-linux 2.38-r1. The image can be found at docker.io/uditgaurav/go-runner:stress.

Current Output:

/ # nsenter --version
nsenter from util-linux 2.38

Previous Output:

~ $ nsenter --version
nsenter: unrecognized option: version
BusyBox v1.35.0 (2022-08-01 15:14:44 UTC) multi-call binary.

It works well with containerd + cgroupv2 🙌. Moving forward, I plan to conduct further tests under these scenarios:

Tagging @ispeakc0de, for any suggestions for additional tests or use cases for nsenter.