Closed onedr0p closed 1 year ago
That was necessary because we don't have a facility to pass CLI flags through to the embedded etcd. For Kubernetes components, you can already just do something like:
--kube-controller-manager-arg=address=10.0.1.2 --kube-controller-manager-arg=bind-address=10.0.1.2
@brandond I am probably mis-reading the code here but it looks like it hardcoded to 127.0.0.1
Will setting the options you described override this?
Yes, if you look a few lines down you can see where the user-provided args are used to update to the args map when flattening the map into the args slice. Since the user args come last, they are preferred over the defaults we provide.
Thanks @brandond 🙏🏼
kubeApiServer:
enabled: true
kubeControllerManager:
enabled: true
endpoints:
- 192.168.42.10
- 192.168.42.11
- 192.168.42.12
kubeScheduler:
enabled: true
endpoints:
- 192.168.42.10
- 192.168.42.11
- 192.168.42.12
kubeProxy:
enabled: true
endpoints:
- 192.168.42.10
- 192.168.42.11
- 192.168.42.12
kubeEtcd:
enabled: true
endpoints:
- 192.168.42.10
- 192.168.42.11
- 192.168.42.12
service:
enabled: true
port: 2381
targetPort: 2381
kube-controller-manager-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
kube-proxy-arg:
- "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
etcd-expose-metrics: true
I can also verify Grafana dashboard are populated :D
For anyone stumbling upon same issue (because it pops on google first search page) In case of v1.22 you will also need to add
serviceMonitor:
enabled: true
https: true
insecureSkipVerify: true
to both kubeControllerManager and kubeScheduler because now it forces https. Also, ports have changed, so my config looks like:
- kubeControllerManager:
enabled: true
endpoints:
- 172.25.25.61
- 172.25.25.62
- 172.25.25.63
service:
enabled: true
port: 10257
targetPort: 10257
serviceMonitor:
enabled: true
https: true
insecureSkipVerify: true
- kubeScheduler:
enabled: true
endpoints:
- 172.25.25.61
- 172.25.25.62
- 172.25.25.63
service:
enabled: true
port: 10259
targetPort: 10259
serviceMonitor:
enabled: true
https: true
insecureSkipVerify: true
Additionally, "address=0.0.0.0" can be dropped because it's deprecated now, see https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/ https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/
Verified on kube-prometheus-stack 20.0.1 and k3s 1.22.3
@onedr0p: how do I exactly set the k3s controller settings on my master nodes? Not during installation but in an running environment. With the k3s config.yaml
?
from your and @rlex comments I understand that the configuration needs to look like this, correct?:
kube-controller-manager-arg:
- "bind-address=0.0.0.0"
kube-proxy-arg:
- "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
- "bind-address=0.0.0.0"
etcd-expose-metrics: true
Yes but I'm k3s 1.22 some defaults in kube prometheus stack need to be changed:
The changes in the kube prometheus stack seem to be clear to me. I am /was struggling with the k3s config. Is it enough to create the config.yaml with just the above configuration, or do I need to restart k3s on each master, or the node itself?
Depends on how you installed k3s, you need to tell k3s to look for the config.yaml
You don't need to tell k3s to look for config.yaml if you place it at /etc/rancher/k3s/config.yaml
. A restart is required to make any changes, regardless of whether you use CLI flags, or a config file, or both.
Worked like charm. Thank you guys!
@onedr0p does this solution hit the problem explained in the issue below?
Thank you
Event after setting the kubeEtcd with server config and helm chart as defined few messages earlier seems to not work correctly.
@Jojoooo1 did you setup single-node or multi-node cluster? In single-node setup there's no etcd.
Thanks! I actually had a single node!
strictly speaking, k3s single-node can have etcd, but only if you added cluster-init parameter to k3s args / config / env
where can I edit this file, for my k3s controller manager?
k3s controllers settings
kube-controller-manager-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
kube-proxy-arg:
- "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
etcd-expose-metrics: true
where can I edit this file, for my k3s controller manager?
k3s controllers settings kube-controller-manager-arg: - "address=0.0.0.0" - "bind-address=0.0.0.0" kube-proxy-arg: - "metrics-bind-address=0.0.0.0" kube-scheduler-arg: - "address=0.0.0.0" - "bind-address=0.0.0.0" etcd-expose-metrics: true
https://rancher.com/docs/k3s/latest/en/installation/install-options/#configuration-file
etcd-expose-metrics: true
kube-controller-manager-arg:
- bind-address=0.0.0.0
kube-proxy-arg:
- metrics-bind-address=0.0.0.0
kube-scheduler-arg:
- bind-address=0.0.0.0
where can I edit this file, for my k3s controller manager?
k3s controllers settings kube-controller-manager-arg: - "address=0.0.0.0" - "bind-address=0.0.0.0" kube-proxy-arg: - "metrics-bind-address=0.0.0.0" kube-scheduler-arg: - "address=0.0.0.0" - "bind-address=0.0.0.0" etcd-expose-metrics: true
https://rancher.com/docs/k3s/latest/en/installation/install-options/#configuration-file
etcd-expose-metrics: true kube-controller-manager-arg: - bind-address=0.0.0.0 kube-proxy-arg: - metrics-bind-address=0.0.0.0 kube-scheduler-arg: - bind-address=0.0.0.0
thanks man, anyway I don't have that file config, but i only have /etc/rancher/k3s/k3s.yaml and /etc/systemd/system/k3s.sservice and some files in /var/lib/rancher/k3s
Is it possible to create it manually or do I need to upgrade my K3S? nb: my current version ( v1.21.7)
@rthamrin config file is not installed by default, you need to manually create it.
I had the same issue with a K3s cluster, and tried following the above solution, but after adding endpoints to the prometheus operator values file, the helm would fail to deploy with the following error:
Error: UPGRADE FAILED: rendered manifests contain a resource that already exists. Unable to continue with update: Endpoints "prometheus-stack-kube-prom-kube-controller-manager" in namespace "kube-system" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "prometheus-stack"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "monitoring"
I found a working solution at this page: https://picluster.ricsanfre.com/docs/prometheus/#k3s-components-monitoring. Leaving it here in case if anyone runs it to the same issue with the helm chart.
@macrokernel this error means helm does not have ownership, try to re-install KPS or add the helm annotations to the existing resources it is complaining about.
Endpoints "prometheus-stack-kube-prom-kube-controller-manager" in namespace "kube-system" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "prometheus-stack"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "monitoring"
@onedr0p I tried completely uninstalling KPS by removing the monitoring namespace where the helm chart was installed and installing it from scratch, but this did not help. I wish I knew which annotations it is missing and how to add them ;)
The error specifically says endpoints in the kube-system namespace.
@onedr0p thanks for guiding me through it :) Adding the following annotation and similar ones for kubeScheduler
and kubeProxy
did the trick:
kubeControllerManager:
annotations:
meta.helm.sh/release-name: monitoring
Helm chart update was successful after the modifications. Yet to check if Prometheus is getting the data.
UPDATE: No, it did not help. I executed helm update with a wrong values file, which does not have the endpoints definitions, - that's why it did not fail. I guess I have to define annotations at a different location in the values file?
UPDATE 2: Just realised that the annotations I added are wrong, there should be 2 of them:
annotations:
meta.helm.sh/release-name: prometheus-stack
meta.helm.sh/release-namespace: monitoring
Still figuring out where they must be placed inside the values file.
@onedr0p, could you please give another hint? I tried adding the annotations all over the values file to no avail.
Don't add helm annotations in those values, delete the specific endpoint(s) in the kube-system namespace and redeploy the chart.
@onedr0p, I removed the operator and everything with prometheus in the kube-system namespace, then reinstalled the operator helm chart. Helm install went without errors, however, prometheus-stack-kube-prom-operator pod is in CrashLoopBackOff state due to the following error:
level=warn ts=2022-08-13T08:13:19.199185707Z caller=operator.go:329 component=prometheusoperator msg="failed to check if the API supports the endpointslice resources" err="converting (v1.APIGroup) to (v1.APIResourceList): unknown conversion"
level=info ts=2022-08-13T08:13:19.199261268Z caller=operator.go:331 component=prometheusoperator msg="Kubernetes API capabilities" endpointslices=false
listening failed:10250listen tcp :10250: bind: address already in use
The port is actually in use on the node:
ss -lntp |grep 10250
LISTEN 0 4096 *:10250 *:* users:(("k3s-server",pid=17540,fd=300))
I've also tried this https://github.com/prometheus-operator/prometheus-operator#removal removal procedure to make sure that there is nothing leftover from the previous attempts, but it did not help.
UPDATE: Sorted out by removing Prometheus with these commands:
kubectl delete crd alertmanagerconfigs.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd probes.monitoring.coreos.com
kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd thanosrulers.monitoring.coreos.com
kubectl delete MutatingWebhookConfiguration prometheus-stack-kube-prom-admission
kubectl delete ValidatingWebhookConfiguration prometheus-stack-kube-prom-admission
kubectl delete namespace monitoring
for n in $(kubectl get namespaces -o jsonpath={..metadata.name}); do \
kubectl delete --ignore-not-found --namespace=$n service -l app.kubernetes.io/part-of=kube-prometheus-stack; \
done
And reinstalling it with the following command:
helm -n monitoring upgrade --install prometheus-stack -f values.prometheus.production.yaml \
--set prometheusOperator.admissionWebhooks.enabled=false \
--set prometheusOperator.admissionWebhooks.patch.enabled=false \
--set prometheusOperator.tlsProxy.enabled=false \
--set prometheusOperator.tls.enabled=false \
./kube-prometheus-stack
Seems like the issue was with the webhooks - I had to disable them.
I spent a couple of days figuring out how to make default kube-prometheus-stack
metrics to work with k3s and found a couple of important things that are not mentioned here.
Firstly, k3s
exposes all metrics combined (apiserver, kubelet, kube-proxy, kube-scheduler, kube-controller) on each metrics endpoint. The only separate metric is embedded etcd database on port 2831
, if you are using it. So if you follow the advice given in this issue and setup scrape jobs for each component separately, you are collecting all metrics duplicated 5 times and wasting prometheus resources.
Now to fix this properly is a bit difficult. All default grafana charts filter data by job name. I.e. the kube-proxy dashboard has job = "kube-proxy"
in all queries. So my first attempt was to rename jobs based on metric name. I added this config to helm values:
kubelet:
serviceMonitor:
metricRelabelings:
# k3s exposes all metrics on all endpoints, relabel jobs that belong to other components
- sourceLabels: [__name__]
regex: "scheduler_(.+)"
targetLabel: "job"
replacement: "kube-scheduler"
- sourceLabels: [__name__]
regex: "kubeproxy_(.+)"
targetLabel: "job"
replacement: "kube-proxy"
This simply sets job label to kube-scheduler
for all metrics that start with scheduler_
, and to kube-proxy
for all metrics that start with kubeproxy_
.
But there is another problem. Instance variable in grafana charts uses up
metric to find all instances of the component (label_values(up{job="kube-scheduler", cluster="$cluster"}, instance)
). You can't rename job of up
metric or other, in this case kubelet, dashboards will stop working. I couldn't find a way to multiply metrics with prometheus rules to create up
metric for each job. So the only way is to edit grafana dashboards and change variable queries to (label_values(up{job="kubelet", cluster="$cluster"}, instance)
).
There are also other metrics, which are shared between components such as rest_client_requests_total
, which are global to entire k3s-server
(all components) and do not make sense in single-component dashboards.
Also keep in mind that default kube-prometheus-stack
configuration already duplicates data 2 times by collecting kubeApiServer
and kubelet
metrics, which are the same. It is best to disable only kubeApiServer
, which collects data only from master nodes, while kubelet
collects from both master and agent nodes. Disabling kubeApiServer
automatically removes apiserver alerts and grafana dashboard so you have to re-import these manually.
It is really unfortunate that k3s makes it so complicated to use kube-prometheus-stack
. If someone has a better solution to make this work without duplicating data, please share.
@chemicstry Thanks for the details, I suppose we could re-open this issue but I am not sure if it is something the k3s maintainers are willing to "fix". Ideally this should all work out of the box with the kube-prometheus-stack
helm chart.
@brandond any comment on this?
There isn't really anything we can fix on our side. The prometheus go libraries use a global metrics configuration, so any metrics registered by any component in a process are exposed by all metrics listeners. There's no way to bind specific metrics to a specific metrics endpoint, when they're all running in the same process. A core efficiency of K3s is that we run all the components in the same process, and we're not planning on changing that.
Has anyone here found a good solution?
What do you think about this? https://github.com/portefaix/portefaix-kubernetes/issues/4682 https://github.com/portefaix/portefaix-kubernetes/commit/dc767bdd3f8e6d0ffe3fe53e36e116ea7f5e6533#diff-725c569b96f4a66ed07e1a4d1a5d8d24b3a500f1a1dae5b80444a2109ce94c17
If others come here by googling, I have found what seems to be a good solution. The solution also addresses two other issues with dashboards.
prometheus:
serviceMonitor:
# fix for https://github.com/prometheus-community/helm-charts/issues/4221
relabelings:
- action: replace
targetLabel: cluster
replacement: yourClusterNameHere
# fix for https://github.com/prometheus-community/helm-charts/issues/3800
grafana:
serviceMonitor:
labels:
release: kube-prometheus-stack
kubeApiServer:
serviceMonitor:
metricRelabelings:
- action: drop
regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: "etcd_request_duration_seconds_bucket"
sourceLabels: [__name__]
- action: drop
regex: (scheduler_plugin_execution_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: (workqueue_work_duration_seconds_bucket)
sourceLabels: [__name__]
kubelet:
serviceMonitor:
cAdvisorRelabelings:
- action: replace
sourceLabels: [__metrics_path__]
targetLabel: metrics_path
- action: replace
targetLabel: instance
sourceLabels:
- "node"
relabelings:
- action: replace
sourceLabels: [__metrics_path__]
targetLabel: metrics_path
metricRelabelings:
- action: drop
regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: "etcd_request_duration_seconds_bucket"
sourceLabels: [__name__]
- action: drop
regex: (scheduler_plugin_execution_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: (workqueue_work_duration_seconds_bucket)
sourceLabels: [__name__]
kubeControllerManager:
# Add all Control Plane IPs
endpoints:
- 10.255.0.101
- 10.255.0.102
- 10.255.0.103
service:
enabled: true
port: 10257
targetPort: 10257
serviceMonitor:
https: true
insecureSkipVerify: true
metricRelabelings:
- action: drop
regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: "etcd_request_duration_seconds_bucket"
sourceLabels: [__name__]
- action: drop
regex: (scheduler_plugin_execution_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: (workqueue_work_duration_seconds_bucket)
sourceLabels: [__name__]
kubeEtcd:
# Add all Control Plane IPs
endpoints:
- 10.255.0.101
- 10.255.0.102
- 10.255.0.103
service:
enabled: true
port: 2381
targetPort: 2381
serviceMonitor:
metricRelabelings:
- action: drop
regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: "etcd_request_duration_seconds_bucket"
sourceLabels: [__name__]
- action: drop
regex: (scheduler_plugin_execution_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: (workqueue_work_duration_seconds_bucket)
sourceLabels: [__name__]
kubeScheduler:
# Add all Control Plane IPs
endpoints:
- 10.255.0.101
- 10.255.0.102
- 10.255.0.103
service:
enabled: true
port: 10259
targetPort: 10259
serviceMonitor:
https: true
insecureSkipVerify: true
metricRelabelings:
- action: drop
regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: "etcd_request_duration_seconds_bucket"
sourceLabels: [__name__]
- action: drop
regex: (scheduler_plugin_execution_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: (workqueue_work_duration_seconds_bucket)
sourceLabels: [__name__]
kubeProxy:
# Add all Control Plane IPs
endpoints:
- 10.255.0.101
- 10.255.0.102
- 10.255.0.103
service:
enabled: true
port: 10249
targetPort: 10249
selector:
k8s-app: kube-proxy
serviceMonitor:
metricRelabelings:
- action: drop
regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: "etcd_request_duration_seconds_bucket"
sourceLabels: [__name__]
- action: drop
regex: (scheduler_plugin_execution_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: (workqueue_work_duration_seconds_bucket)
sourceLabels: [__name__]
coreDns:
serviceMonitor:
metricRelabelings:
- action: drop
regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: "etcd_request_duration_seconds_bucket"
sourceLabels: [__name__]
- action: drop
regex: (scheduler_plugin_execution_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: (workqueue_work_duration_seconds_bucket)
sourceLabels: [__name__]
kubeDns:
serviceMonitor:
metricRelabelings:
- action: drop
regex: (apiserver_request_duration_seconds_bucket|apiserver_request_body_size_bytes_bucket|apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket|apiserver_request_sli_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: "etcd_request_duration_seconds_bucket"
sourceLabels: [__name__]
- action: drop
regex: (scheduler_plugin_execution_duration_seconds_bucket)
sourceLabels: [__name__]
- action: drop
regex: (workqueue_work_duration_seconds_bucket)
sourceLabels: [__name__]
kube-state-metrics:
prometheus:
monitor:
relabelings:
- action: replace
targetLabel: "instance"
sourceLabels:
- "__meta_kubernetes_pod_node_name"
prometheus-node-exporter:
prometheus:
monitor:
relabelings:
- action: replace
targetLabel: "instance"
sourceLabels:
- "__meta_kubernetes_pod_node_name"
@mrclrchtr if I understand this correctly you are still going to have duplicate metrics, this can lead to absolutely insane high memory usage with Prometheus. I recently switch my cluster from k3s to Talos and saw 2-3GB less usage in memory per Prometheus instance since Talos exports these metrics the "standard" way.
The best method I found was to do the analysis across what needs to be kept on each component and write relabelings based upon that research. For example https://github.com/onedr0p/home-ops/blob/e6716b476ff1432ddbbb7d4efa0e10d0ac4e9a66/kubernetes/storage/apps/observability/kube-prometheus-stack/app/helmrelease.yaml However this isn't perfect as well and will not dedupe all metric labels across the components, it's prone to error and when updates to Kubernetes happen won't capture any new metrics being emitted. FWIW even with these relabelings I was still seeing a 3-4GB RAM usage per prometheus instance with them applied.
I would love for k3s to support a native way to handle this with the kube-prometheus-stack as it's my major pain point with it, not obvious until one discovers this issue and one of the major reasons I am exploring other options like Talos. 😢
Damn... I was hoping this would be the solution... I'll probably have to look for alternatives too... I've invested way too much time in this already....
But thanks for the info! I'll have a look at Talos too.
Is your feature request related to a problem? Please describe.
Unable to monitor the following components using
kube-prometheus-stack
:Describe the solution you'd like
Add configuration options like in PR https://github.com/k3s-io/k3s/pull/2750 for each component so they are not only being bound to
127.0.0.1
E.g.
In kube-prometheus-stack configuration all you have to do is configure the following:
Describe alternatives you've considered
Deploying rancher-pushprox to get these metrics exposed but it's not very easy to do, or user-friendly
Additional context
I am willing to give a shot at opening a PR as it should be pretty close to https://github.com/k3s-io/k3s/pull/2750
Related to https://github.com/k3s-io/k3s/issues/425