Open zqzten opened 1 year ago
Hi @zqzten,
Thank you for tracking this one and for making this contribution. I understand that the Go runtime/pprof and net/http/pprof packages are used for profiling and performance analysis of Go applications. Therefore, it is not recommended to be applied to solutions in production such as the metrics are.
However, it seems a lovely feature to let others be aware of. What do you think about addressing this one as documentation? What about add a new topic in the docs, such as benchmark
and adding a guide with examples of how to use this feature to benchmark the solution and check to analyze the performance?
WDYT? Would that make sense? Would you like to push a PR within the doc since you are the featured author?
Hi @camilamacedo86, thanks for the reply.
Therefore, it is not recommended to be applied to solutions in production such as the metrics are.
This might not be the case I think, since enabling pprof itself would not introduce performance degression to the application (only an explicit profling request would do so). On the contrary, pprof is a must have when we want to inspect performance problems such as memory leak in production, as alvaroaleman said here. Also, as an evidence, k8s components such as kube-controller-manager also enables pprof by default.
So I still think that we'd better enable pprof by default, and the only concern is the security issues mentioned above.
HI @zqzten,
It is great that you want to help users and Operator authors with this feature.
Security: The pprof endpoints can expose potentially sensitive runtime details, such as the number of goroutines, the memory usage, CPU usage, and more. This could provide a vector for malicious parties to gather information about the application and potentially exploit it. By enabling pprof by default, you could inadvertently expose these endpoints to the public, posing a serious security risk.
Performance: While pprof is an excellent tool for diagnosing performance issues, it is not without cost. Profiling can consume considerable CPU and memory resources, especially when continuously enabled in a production environment. This could lead to performance degradation and impact the scalability of your application.
These concerns are less pronounced in a development environment, and the benefits of having pprof enabled (e.g., easier debugging, performance tuning) may outweigh the potential drawbacks. However, in a production setting, the risks usually outweigh the benefits.
However, we cannot add it to the default scaffolds done by Kubebuilder because it is not recommended for production and would not be a good practice. If we provided it in the default scaffolds, authors would need to remove or disable the feature prior to adding it on production, which in reality would not happen and would be a big ask to do so.
On the contrary, pprof is a must have when we want to inspect performance problems such as memory leak in production, as https://github.com/kubernetes-sigs/controller-runtime/pull/1943#issuecomment-1176343059
In the controller-runtime library, which underpins Kubebuilder, pprof debugging endpoints are not enabled by default. This is an intentional design decision, made to prioritize security and performance for users, particularly those who are new to Kubebuilder. Experienced developers and operators who understand the implications and need to use pprof for performance tuning or debugging can choose to enable it explicitly. This is a stark contrast to including it in the default Kubebuilder scaffolds, which primarily cater to those who are just starting out and may not fully understand the implications of enabling pprof by default.
Also, as an evidence, k8s components such as kube-controller-manager also enables pprof by default.
You're correct. kube-controller-manager, which is a part of the Kubernetes control plane, does enable pprof endpoints by default. It's important to understand why and under what circumstances it might be considered acceptable.
As part of the Kubernetes control plane, the kube-controller-manager runs on the master nodes of a Kubernetes cluster. The master nodes are not typically exposed to the public internet, and they should be protected by various security measures. This means that, in a correctly configured cluster, the pprof endpoints of the kube-controller-manager should not be accessible to untrusted entities.
Enabling pprof by default on the kube-controller-manager allows Kubernetes developers and administrators to debug performance issues directly on running clusters. This can be very helpful for diagnosing problems in a live environment. However, it is still a potential security risk if an attacker were able to gain access to the master nodes, and I believe that in many environments, system admins will disable that if/when the env is not in a protected or disconnect network (airgap envs)
That being said, it's not that you can't enable pprof in a Kubebuilder scaffold; it's just not recommended for production use, hence not included by default. In controlled circumstances, such as a secure and isolated environment, with appropriate monitoring and precautions, you could consider enabling pprof for specific profiling tasks. But such use should be temporary and closely supervised.
Asking DevOps, who uses kubebuilder, to be aware that the default scaffolds expose their security this way is inappropriate and would be a big ask.
That could fit in an optional plugin. But, the code implementation to achieve this goal shows to be very small, and the effort to maintain the plugin would not justify what it could bring alone. However, if you have a proposed plugin solution that involves the pproff and other helpers' features that justify that it also might be a great fit Please, see the plugins section and check that we have optional plugins which can be used to add specific scaffolds to do the project.
It would be great if we have a doc with the whole guide on how to implement its usage, what is possible to achieve with it and how.
It might be acceptable if you have any proposed achievable idea that is not enabling it in the default scaffolds. Anyway, the code also should be shipped with documentation explaining how and when to use it.
I hope that makes sense. Also, please feel free to contribute within PRs and propose solutions that fit and are aligned with the above points.
Wow thanks for the detailed explanation! @camilamacedo86
Profiling can consume considerable CPU and memory resources, especially when continuously enabled in a production environment.
CPU profiling is not activated at initialization time, it is only activated for a period of time (30 seconds by default) when the /debug/pprof/profile
http service is called. So enabling pprof server alone should not impact much the performance of the application (except that extra goroutines for the http server are needed).
However, it is still a potential security risk if an attacker were able to gain access to the master nodes, and I believe that in many environments, system admins will disable that if/when the env is not in a protected or disconnect network (airgap envs).
The pprof endpoint of kube-controller-manager is not only protected by network restrictions, but also protected by mTLS with RBAC (like that in kube-apiserver). So it's rather safe to keep it on by default in production.
For other points, I understand and agree with your opinion, since additional security efforts shall be applied to pprof endpoint, it might not be suitable to make it enabled by default to everyone, especially for those who have no need or knowledge of pprof.
For now, I think a brief doc would be enough to let ppl know this featue and related concerns on this, the plugin way is a little overkill. Glad to work on this :)
Hi @zqzten,
I am so happy that we could convey that doc this one for now seems the best way to move forward. Please, feel free to work in this doc if you wish.
I am adding Help wanted on this one as we. Have its doc would be very helpful for sure.
The comments above were hidden for simplicity. While we've decided to document its usage rather than include it in the default scaffold, you can refer to the hidden comments for more context on this decision. Nevertheless, introducing it as an optional plugin for the scaffold could be a viable alternative. This option would certainly be a beneficial addition.
To address this issue we must:
/assign
@camilamacedo86 Won't our documentation vary based on if and how we incorporate the support for pprof in Kubebuilder?
It is supported since it is a feature in the controller-runtime. The idea here is just add a doc under references to let people know how to use and when to use as its implications. We cannot add it in the default scaffold because it is not recommended for prod envs, see;
https://github.com/kubernetes-sigs/kubebuilder/issues/3338#issuecomment-1541908179 (I re-open it just to share why we cannot add in the default scaffold)
So, in the doc we need to also share this info to allow people aware of.
What do you want to happen?
As pprof support been introduced to controller-runtime, we can follow up to make it available (and visible) to end users of kubebuilder. This issue tracks the tasks needed TBD, please comment if I miss or get wrong on anything.
main.go
of kubebuilder scaffoldFor the third task, I'm not sure what best practice we shall provide to end users so this is RFC.
For now, I can think up two possible ways:
127.0.0.1:xxxx
Please comment your preference or other opinions if applicable, thanks!
Extra Labels
/kind documentation