Open jaronoff97 opened 1 year ago
I've seen O(tens) of requests for this on the OpenTelemetry slack channels. Having it in the community would be great, as we could promote its adoption more widely.
I am certainly interested in this if users are interested in this. A couple questions:
Thanks for your questions :)
I really like this idea, but I have a question - is there a plan to move away from kube-state-metrics, node-exporter etc in favour of otel collector native receivers (k8sclusterreceiver and hostmetrics) ?
I think in general we should strive to collect all the prometheus metrics from k8s components, but not use any of the Prometheus ecosystem components and use Collector's native features :)
@jaronoff97 I'm also curious if your chart handles the installation of the operator and the OpentelemetryCollector object like discussed here: https://github.com/open-telemetry/opentelemetry-helm-charts/issues/69
I have been using this chart for 3 weeks, it is working out of the box but it will need to be improved (of course). It brings almost the same functionalities as "Prometheus Operator with kube-prometheus-stack chart". It is much lightweight as you only deploy "agents" to scrape your logs/metrics/traces. I am using it to send metrics to AWS AMP (managed prometheus).
Here are the main issue I encountered so far :
Thanks for the good work.
updates/context setting: @TylerHelmuth I still want to donate this if that's still okay. I've validated with a few other people that this would be a great thing for the community to have. The only blocker for this work is to figure out if we can install the operator in the same chart which would make for a better experience. My team is going to be investigating this.
@jaronoff97 sounds good. @open-telemetry/helm-approvers please add your thoughts.
I approve. Thanks @jaronoff97
I don't think I agree that we need another chart for this. I'd rather go with adding the TA option to the collector chart.
Also, why do we promote using Prometheus for scraping kubernetes/kubelet metrics instead of using specialized collector receivers that collect metrics complaint with OTel semantic conventions without additional transformations?
I think this would provide a bridge for existing kps users that otherwise would not care to switch (afaik Prometheus is still used in ~ 99.x% of Kubernetes deployments for cluster monitoring). Reusing the existing Prometheus-Operator objects would smooth out that migration.
I also see value in a "transition" chart. Long term (like long long term), I think a need for a chart like this diminishes, but for users today who have extensive Prometheus setups but want to try out OTel or start transitioning to OTel I think this chart fits their needs.
Ok, I'm not blocking it. If most @open-telemetry/helm-approvers think it's a good addition, let's add it
The name should somehow reflect the Prometheus bridge/transition in its name. kube-otel-stack
doesn't seem right to me
Could also be cool to include somewhere how to grab the same telemetry using the collector and its components.
I'm not sure how this transitioning chart would work? Should we assume that user installed kube-prometheus-stack and we try to somehow migrate it from that to this chart?
I was thinking having kube-otel-stack
which initially works like kube-prometheus-stack
, collects metrics using Prometheus, but slowly we could refactor it to use native OpenTelemetry Collector receivers and functionality.
I'm not sure how this transitioning chart would work? Should we assume that user installed kube-prometheus-stack and we try to somehow migrate it from that to this chart?
We should probably assume that the majority of admins scrape their k8s api endpoints with Prometheus via prometheus-operator objects like Service/PodMonitor
that we can reuse with this stack.
As such a user, initially I would have both Prometheus and otel collector scraping this data and comparing the results/setup complexity before making any decision.
I would also see this as a 'transition' chart, but the migration path to me is something like...
kube-prometheus-stack
-> kube-otel-stack
-> opentelemetry-operator
In the (admittedly, kinda far?) future, I can see the operator using native OpenTelemetry components and monitoring CRDs to perform the same basic functions as this stack, but in the short-to-medium term, having this in the org will give us a pat answer for "how should I monitor k8s with OpenTelemetry?"
Hi, quick bump on this issue - one pretty common piece of feedback we got at KubeCon EU was the amount of people who didn't know the operator existed. I believe getting this chart brought in would help a lot with that, as we could then signpost this from the docs as a "how to get started with kubernetes".
@dmitryax is there anything else we're waiting on before accepting PRs adding this chart?
@TylerHelmuth I think this issue is still a blocker. I'm going to run some tests right now to track this down and solve it.
Okay after a little mish-moshing of things... i was able to get a chart that installs cert-manager (a requirement of the operator), the operator, and a collector to install together in a single chart. The problem is that it doesn't all install at once for a few reasons.
Given most clusters will already have cert-manager installed, here's what the installation process would look like...
MutatingWebhookConfiguration
object to Ignore
could also solve this on first install
opentelemetry-operator:
admissionWebhooks:
failurePolicy: 'Ignore'
The operator and collector installed together successfully! An end user using this chart could just as easily enable the mutating webhook post-install as well, but that's not an ideal experience IMO.
I would love to hear thoughts on this, and see if there's anything I missed in my findings here. cc @open-telemetry/helm-maintainers
For the cert manager my preference would be to copy whatever pattern kube-prometheus-stack is using. If we can't install the cert manager as part of the chart install that will at least follow our existing pattern for the operator, although there is an issue opened about that friction: https://github.com/open-telemetry/opentelemetry-helm-charts/issues/550
Setting the failurePolicy on the MutatingWebhookConfiguration object to Ignore
When I investigated this a while ago this is the solution I stumbled upon and I believe it is the solution that kube-prometheus-stack uses.
Looking as to what the kube-prometheus-stack does right now.
It looks like it's configurable (obv) It's default behavior is empty and enabled, which means the policy is going to be set to Ignore
so I think that seems reasonable for us to do.
They also recommend pre-installing cert-manager on a cluster to use these webhooks.
Seeing as the chart is trying to follow the same pattern for value I think it makes sense to follow the same technical patterns as well.
Agreed. I can work on it this week and next week to match those expectations. I'll include some docs about these decisions as well.
I believe it is the solution that kube-prometheus-stack uses.
Yes, Indeed
Is this something someone is still working on? Given how complex the whole ecosystem was to grasp for me starting out, what would makes the most sense from my perspective is have some way to add presets into the Opentelemetry Operator.
IMO if someone wants to plug in Otel to their cluster most likely they'll want to have the ability to get:
It would be ideal if the default setup of the operator easily allowed you to get a setup like the one Honeycomb suggests in their getting started
@ferrucc-io yes I'm still working on this, I've had a whole slew of other priorities that keep taking precedence.
Many prometheus and kubernetes users are familiar with the kube-prometheus-stack chart which aims to quickly set up and manage a prometheus and grafana installation for a user that collects mostly all of the Kubernetes metrics available. It achieves this using the Prometheus operator and
ServiceMonitor
andPodMonitor
custom resources that configure a user's Prometheus scrape config. We have the ability to do the same using the OpenTelemetry Operator and the Target Allocator. In order to provide an easy and familiar migration path to existing (or new) Prometheus and Kubernetes users, I created the kube-otel-stack chart which installs a pre-configured collector and target allocator to dynamicallyServiceMonitor
andPodMonitor
custom resources to scrape various Kubernetes metrics. You can see below some of the metrics this collector is scraping.This has since become a requested feature across the otel slack from what i can tell, as I've DM'ed this chart to at least 3 different people at this point. I was wondering if it would be welcome for me to clean up and make more generic this slightly opinionated helm chart and donate it to the repository.
Other options considered
TODO