redpanda-data / helm-charts

Redpanda Helm Chart
http://redpanda.com
Apache License 2.0
73 stars 97 forks source link

Tuners may not be fully getting enabled on k8s clusters deployed via helm #1017

Open hcoyote opened 7 months ago

hcoyote commented 7 months ago

What happened?

We're investigating a cpu imbalance on a large EKS-based cluster deployed via helm. Perf team indicates that the cpu imbalance is likely related to tuners not getting enabled on this cluster. The cluster is using our defaults to enable tuners. We dug through the tuning container and see the following, suggesting only the aio tuner is being applied.

$ kubectl logs redpanda-0 -c tuning
TUNER                  APPLIED  ENABLED  SUPPORTED  ERROR
aio_events             true     true     true
ballast_file           false    false    true
clocksource            false    false    false      Clocksource setting not available for this architecture
coredump               false    false    true
cpu                    false    false    true
disk_irq               false    false    true
disk_nomerges          false    false    false      Directory '' does not exists
disk_scheduler         false    false    false      Directory '' does not exists
disk_write_cache       false    false    false      Directory '' does not exists
fstrim                 false    false    false      dial unix /run/systemd/private: connect: no such file or directory
net                    false    false    true
swappiness             false    false    true
transparent_hugepages  false    false    true

Looking through the statefulset it appears that we fire off rpk redpanda tune all in a privileged container if tune_aio_events is true (which it is by default).

https://github.com/redpanda-data/helm-charts/blob/9261d130d1a486526f5c2c0437c11d03b91ab43d/charts/redpanda/templates/statefulset.yaml#L69-L91

But ... rpk redpanda tune all requires some configuration hints in redpanda.yaml to know which tuners to actually apply. It doesn't seem like we actually place those hints into the config so that tune all does anything.

confusingly, the following values.yaml suggests that most of these tunings are not valid in containerized environments, but I can find no history/indication why this is true (other than it was asserted sometime in 2022). The only thing I can assume at this point is that the tuners that manipulate sysctl were shown to work at some point in the past, but that the ones manipulating files in /sys (which is most of the interrupt/cpu stuff) may NOT have worked. Discussion from Perf team suggests that we need these in any environment (k8s or deployed to OS) no matter what, and discussion with @c4milo suggest that we have this working in the operator used for cloud/byoc.

https://github.com/redpanda-data/helm-charts/blob/b5469209ffa05ab8050d260fc685365b899bc4f4/charts/redpanda/values.yaml#L794-L834

The hints that would be needed in the redpanda.yaml:

rpk:
    tune_network: true
    tune_disk_scheduler: true
    tune_disk_nomerges: true
    tune_disk_write_cache: true
    tune_disk_irq: true
    tune_cpu: true
    tune_aio_events: true
    tune_clocksource: true
    tune_swappiness: true
    coredump_dir: /var/lib/redpanda/coredump
    tune_ballast_file: true

What did you expect to happen?

I expect the tuners to be configurable in the values.yaml and have those tunables get applied when the tuning container runs.

How can we reproduce it (as minimally and precisely as possible)?. Please include values file.

Can provide values example separately, but generally should occur with any default one we have at this point.

Anything else we need to know?

See also https://redpandadata.slack.com/archives/C01H6JRQX1S/p1706816485871719

See also interrupt channel 464 and more specifically this thread https://redpandadata.slack.com/archives/C06E573MBGE/p1706804403275249

Which are the affected charts?

Redpanda

Chart Version(s)

problem occurs in 5.6.60 and whatever latest was as of Feb 1, 2024.

Cloud provider

Self-hosted on AWS EKS, but likely affects any helm-managed k8s install.

JIRA Link: K8S-101

hcoyote commented 7 months ago

Note: if the expectation is that we should instead be applying tuner configs on the base-os of the k8s node, then we need to figure out how to get rpk redpanda tune all --output-script to work correctly given we still end up not having enough hints (afaict) in redpanda.yaml to make each tuner run to generate the script snippet.

We also need to figure out a way to get that deployed to the underlying nodes and run on every system startup (so also generating some sort of systemd unit files or something else on startup).

The way base-os installs get away with this is the redpanda rpm/deb deploy a systemd unit file for redpanda-tuner.service that runs rpk redpanda tune all, but by that point we have enough populated in redpanda.yaml for tuner configs to fire off.

We probably cannot tell users to also install redpanda rpm/deb on base system as that defeats the purpose of having separate OS and K8s installs.

chrisseto commented 6 months ago

Relevant docs when work begins: https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/

chrisseto commented 6 months ago

Following up from our slack thread.

tuners can be enabled/configured via the tuning stanza provided that tune_aoi_events is true.

Here we see tune_cpu results in cpu's APPLIED value being set to true

❯ helm install redpanda redpanda/redpanda --create-namespace --version 5.6.60 --set 'tuning.tune_cpu=true'
❯ kubectl --namespace default logs redpanda-0 -c tuning
TUNER                  APPLIED  ENABLED  SUPPORTED  ERROR
aio_events             true     true     true
ballast_file           false    false    true
clocksource            false    false    false      Clocksource setting not available for this architecture
coredump               false    false    true
cpu                    true     true     true
disk_irq               false    false    true
disk_nomerges          false    false    false      Directory '' does not exists
disk_scheduler         false    false    false      Directory '' does not exists
disk_write_cache       false    false    false      Directory '' does not exists
fstrim                 false    false    false      dial unix /run/systemd/private: connect: no such file or directory
net                    false    false    true
swappiness             false    false    true
transparent_hugepages  false    false    true

Whether or not enabling these tuners actually does anything for Redpanda remains an open question.

chrisseto commented 5 months ago

@hcoyote What do you think the resolution of this ticket should be? Seems like the best option for now might be updating our documentation to further explain why the tuner doesn't work within Kubernetes and instead suggesting that users utilize cloud-init or similar? I wouldn't be opposed to removing the tuner entirely FWIW.

hcoyote commented 5 months ago

I don't know what the viable solution is right now.

I think we need weigh in from @c4milo and probably @StephanDollberg at minimum. I think the assertion is that, for performance and supportability, we need to get tuners reliably and consistently applied no matter what the deployment methodology is (e.g., bare-os/self-hosted k8s, cloud, etc).

The work around we have today for EKS is to do this via cloud-init. Camilo was working on some daemonset stuff to make this work in AKS, so maybe that's something we can pull back to helm?

Whatever we do for cloud is probably similar to what we should do for self-hosted k8s (on cloud at least). We still need to determine a suitable answer for self-hosted k8s on multi-tenant shared k8s infra.