kubernetes-sigs / kubebuilder

Kubebuilder - SDK for building Kubernetes APIs using CRDs
http://book.kubebuilder.io
Apache License 2.0
7.76k stars 1.43k forks source link

Document how to use pprof which was introduced in controller-runtime v0.15.x #3338

Open zqzten opened 1 year ago

zqzten commented 1 year ago

What do you want to happen?

As pprof support been introduced to controller-runtime, we can follow up to make it available (and visible) to end users of kubebuilder. This issue tracks the tasks needed TBD, please comment if I miss or get wrong on anything.

For the third task, I'm not sure what best practice we shall provide to end users so this is RFC.

For now, I can think up two possible ways:

Please comment your preference or other opinions if applicable, thanks!

Extra Labels

/kind documentation

camilamacedo86 commented 1 year ago

Hi @zqzten,

Thank you for tracking this one and for making this contribution. I understand that the Go runtime/pprof and net/http/pprof packages are used for profiling and performance analysis of Go applications. Therefore, it is not recommended to be applied to solutions in production such as the metrics are.

However, it seems a lovely feature to let others be aware of. What do you think about addressing this one as documentation? What about add a new topic in the docs, such as benchmark and adding a guide with examples of how to use this feature to benchmark the solution and check to analyze the performance?

WDYT? Would that make sense? Would you like to push a PR within the doc since you are the featured author?

zqzten commented 1 year ago

Hi @camilamacedo86, thanks for the reply.

Therefore, it is not recommended to be applied to solutions in production such as the metrics are.

This might not be the case I think, since enabling pprof itself would not introduce performance degression to the application (only an explicit profling request would do so). On the contrary, pprof is a must have when we want to inspect performance problems such as memory leak in production, as alvaroaleman said here. Also, as an evidence, k8s components such as kube-controller-manager also enables pprof by default.

So I still think that we'd better enable pprof by default, and the only concern is the security issues mentioned above.

camilamacedo86 commented 1 year ago

HI @zqzten,

It is great that you want to help users and Operator authors with this feature.

The primary reason for not enabling pprof in production (or in default scaffolds) revolves around two key concerns: Security and Performance.

These concerns are less pronounced in a development environment, and the benefits of having pprof enabled (e.g., easier debugging, performance tuning) may outweigh the potential drawbacks. However, in a production setting, the risks usually outweigh the benefits.

However, we cannot add it to the default scaffolds done by Kubebuilder because it is not recommended for production and would not be a good practice. If we provided it in the default scaffolds, authors would need to remove or disable the feature prior to adding it on production, which in reality would not happen and would be a big ask to do so.

Following the comments inline.

On the contrary, pprof is a must have when we want to inspect performance problems such as memory leak in production, as https://github.com/kubernetes-sigs/controller-runtime/pull/1943#issuecomment-1176343059

In the controller-runtime library, which underpins Kubebuilder, pprof debugging endpoints are not enabled by default. This is an intentional design decision, made to prioritize security and performance for users, particularly those who are new to Kubebuilder. Experienced developers and operators who understand the implications and need to use pprof for performance tuning or debugging can choose to enable it explicitly. This is a stark contrast to including it in the default Kubebuilder scaffolds, which primarily cater to those who are just starting out and may not fully understand the implications of enabling pprof by default.

Also, as an evidence, k8s components such as kube-controller-manager also enables pprof by default.

You're correct. kube-controller-manager, which is a part of the Kubernetes control plane, does enable pprof endpoints by default. It's important to understand why and under what circumstances it might be considered acceptable.

As part of the Kubernetes control plane, the kube-controller-manager runs on the master nodes of a Kubernetes cluster. The master nodes are not typically exposed to the public internet, and they should be protected by various security measures. This means that, in a correctly configured cluster, the pprof endpoints of the kube-controller-manager should not be accessible to untrusted entities.

Enabling pprof by default on the kube-controller-manager allows Kubernetes developers and administrators to debug performance issues directly on running clusters. This can be very helpful for diagnosing problems in a live environment. However, it is still a potential security risk if an attacker were able to gain access to the master nodes, and I believe that in many environments, system admins will disable that if/when the env is not in a protected or disconnect network (airgap envs)

Conclusion: IMO:

That being said, it's not that you can't enable pprof in a Kubebuilder scaffold; it's just not recommended for production use, hence not included by default. In controlled circumstances, such as a secure and isolated environment, with appropriate monitoring and precautions, you could consider enabling pprof for specific profiling tasks. But such use should be temporary and closely supervised.

Asking DevOps, who uses kubebuilder, to be aware that the default scaffolds expose their security this way is inappropriate and would be a big ask.

Options to add the feature on Kubebuilder:

Allowing the scaffolds via an optional plugin

That could fit in an optional plugin. But, the code implementation to achieve this goal shows to be very small, and the effort to maintain the plugin would not justify what it could bring alone. However, if you have a proposed plugin solution that involves the pproff and other helpers' features that justify that it also might be a great fit Please, see the plugins section and check that we have optional plugins which can be used to add specific scaffolds to do the project.

Provide guidance and tutorials:

It would be great if we have a doc with the whole guide on how to implement its usage, what is possible to achieve with it and how.

Alternative Option to provide the scaffold/code:

It might be acceptable if you have any proposed achievable idea that is not enabling it in the default scaffolds. Anyway, the code also should be shipped with documentation explaining how and when to use it.

I hope that makes sense. Also, please feel free to contribute within PRs and propose solutions that fit and are aligned with the above points.

zqzten commented 1 year ago

Wow thanks for the detailed explanation! @camilamacedo86

Profiling can consume considerable CPU and memory resources, especially when continuously enabled in a production environment.

CPU profiling is not activated at initialization time, it is only activated for a period of time (30 seconds by default) when the /debug/pprof/profile http service is called. So enabling pprof server alone should not impact much the performance of the application (except that extra goroutines for the http server are needed).

However, it is still a potential security risk if an attacker were able to gain access to the master nodes, and I believe that in many environments, system admins will disable that if/when the env is not in a protected or disconnect network (airgap envs).

The pprof endpoint of kube-controller-manager is not only protected by network restrictions, but also protected by mTLS with RBAC (like that in kube-apiserver). So it's rather safe to keep it on by default in production.

For other points, I understand and agree with your opinion, since additional security efforts shall be applied to pprof endpoint, it might not be suitable to make it enabled by default to everyone, especially for those who have no need or knowledge of pprof.

For now, I think a brief doc would be enough to let ppl know this featue and related concerns on this, the plugin way is a little overkill. Glad to work on this :)

camilamacedo86 commented 1 year ago

Hi @zqzten,

I am so happy that we could convey that doc this one for now seems the best way to move forward. Please, feel free to work in this doc if you wish.

I am adding Help wanted on this one as we. Have its doc would be very helpful for sure.

camilamacedo86 commented 1 year ago

The comments above were hidden for simplicity. While we've decided to document its usage rather than include it in the default scaffold, you can refer to the hidden comments for more context on this decision. Nevertheless, introducing it as an optional plugin for the scaffold could be a viable alternative. This option would certainly be a beneficial addition.

To address this issue we must:

TAM360 commented 3 weeks ago

/assign

TAM360 commented 3 weeks ago

@camilamacedo86 Won't our documentation vary based on if and how we incorporate the support for pprof in Kubebuilder?

camilamacedo86 commented 3 weeks ago

It is supported since it is a feature in the controller-runtime. The idea here is just add a doc under references to let people know how to use and when to use as its implications. We cannot add it in the default scaffold because it is not recommended for prod envs, see;

https://github.com/kubernetes-sigs/kubebuilder/issues/3338#issuecomment-1541908179 (I re-open it just to share why we cannot add in the default scaffold)

So, in the doc we need to also share this info to allow people aware of.