olxbr / metrics-server-exporter

Metrics Server Exporter
MIT License
67 stars 24 forks source link

Pod time-series missing on /metrics page #33

Closed brodul closed 4 years ago

brodul commented 4 years ago

:wave: Tnx for the awesome project. I have an similar issue than #26

I have deployed the project with:

$ kubectl apply -f deploy/

When I check the /metric via port-forwarding the exporter service:

# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 27615232.0
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 22327296.0
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1582117194.62
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 31.54
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 7.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1048576.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="6",patchlevel="9",version="3.6.9"} 1.0
# HELP kube_metrics_server_response_time Metrics Server API Response Time
# TYPE kube_metrics_server_response_time gauge
kube_metrics_server_response_time{api_url="https://kubernetes.default.svc/metrics.k8s.io"} 0.19948
# HELP kube_metrics_server_nodes_mem Metrics Server Nodes Memory
# TYPE kube_metrics_server_nodes_mem gauge
# HELP kube_metrics_server_nodes_cpu Metrics Server Nodes CPU
# TYPE kube_metrics_server_nodes_cpu gauge
# HELP kube_metrics_server_pods_mem Metrics Server Pods Memory
# TYPE kube_metrics_server_pods_mem gauge
# HELP kube_metrics_server_pods_cpu Metrics Server Pods CPU
# TYPE kube_metrics_server_pods_cpu gauge

I have some pods:

$ kubectl get pods
NAME                                             READY   STATUS    RESTARTS   AGE
metrics-server-exporter-6d97fb5cf7-fnqbb         1/1     Running   0          65m
prometheus-alertmanager-64497676f8-ttfpf         2/2     Running   0          72m
prometheus-kube-state-metrics-5d49966699-jpn9w   1/1     Running   0          72m
prometheus-node-exporter-8dkmm                   1/1     Running   0          72m
prometheus-pushgateway-5bb46ff89f-zjgdb          1/1     Running   0          72m
prometheus-server-7fbcdd878-whvrj                2/2     Running   0          72m
wordpress-8bcc6cf8c-rss26                        1/1     Running   0          78m
wordpress-mariadb-0                              1/1     Running   0          78m
$ kubectl top pods
NAME                                             CPU(cores)   MEMORY(bytes)   
metrics-server-exporter-6d97fb5cf7-fnqbb         6m           15Mi            
prometheus-alertmanager-64497676f8-ttfpf         1m           9Mi             
prometheus-kube-state-metrics-5d49966699-jpn9w   0m           6Mi             
prometheus-node-exporter-8dkmm                   0m           7Mi             
prometheus-pushgateway-5bb46ff89f-zjgdb          0m           4Mi             
prometheus-server-7fbcdd878-whvrj                6m           115Mi           
wordpress-8bcc6cf8c-rss26                        9m           169Mi           
wordpress-mariadb-0                              5m           76Mi            
$ kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods"
{"kind":"PodMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/metrics.k8s.io/v1beta1/pods"},"items":[{"metadata":{"name":"prometheus-pushgateway-5bb46ff89f-z
jgdb","namespace":"default","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/prometheus-pushgateway-5bb46ff89f-zjgdb","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp
":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"prometheus-pushgateway","usage":{"cpu":"0","memory":"4548Ki"}}]},{"metadata":{"name":"kube-addon-manager-minikube","namespac
e":"kube-system","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/kube-addon-manager-minikube","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05
:00Z","window":"1m0s","containers":[{"name":"kube-addon-manager","usage":{"cpu":"13m","memory":"3660Ki"}}]},{"metadata":{"name":"wordpress-mariadb-0","namespace":"default","selfLink":"/api
s/metrics.k8s.io/v1beta1/namespaces/default/pods/wordpress-mariadb-0","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"
mariadb","usage":{"cpu":"5m","memory":"78392Ki"}}]},{"metadata":{"name":"prometheus-node-exporter-8dkmm","namespace":"default","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/prometheus-node-exporter-8dkmm","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"prometheus-node-exporter","usage":{"cpu":"1m","memory":"8416Ki"}}]},{"metadata":{"name":"wordpress-8bcc6cf8c-rss26","namespace":"default","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/wordpress-8bcc6cf8c-rss26","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"wordpress","usage":{"cpu":"14m","memory":"173780Ki"}}]},{"metadata":{"name":"prometheus-kube-state-metrics-5d49966699-jpn9w","namespace":"default","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/prometheus-kube-state-metrics-5d49966699-jpn9w","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"prometheus-kube-state-metrics","usage":{"cpu":"1m","memory":"6900Ki"}}]},{"metadata":{"name":"prometheus-alertmanager-64497676f8-ttfpf","namespace":"default","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/prometheus-alertmanager-64497676f8-ttfpf","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"prometheus-alertmanager","usage":{"cpu":"2m","memory":"8384Ki"}},{"name":"prometheus-alertmanager-configmap-reload","usage":{"cpu":"0","memory":"1412Ki"}}]},{"metadata":{"name":"storage-provisioner","namespace":"kube-system","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/storage-provisioner","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"storage-provisioner","usage":{"cpu":"0","memory":"15712Ki"}}]},{"metadata":{"name":"kube-scheduler-minikube","namespace":"kube-system","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/kube-scheduler-minikube","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"kube-scheduler","usage":{"cpu":"3m","memory":"10512Ki"}}]},{"metadata":{"name":"kube-proxy-qfzzv","namespace":"kube-system","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/kube-proxy-qfzzv","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"kube-proxy","usage":{"cpu":"3m","memory":"9512Ki"}}]},{"metadata":{"name":"kube-controller-manager-minikube","namespace":"kube-system","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/kube-controller-manager-minikube","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"kube-controller-manager","usage":{"cpu":"28m","memory":"36260Ki"}}]},{"metadata":{"name":"etcd-minikube","namespace":"kube-system","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/etcd-minikube","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"etcd","usage":{"cpu":"39m","memory":"42812Ki"}}]},{"metadata":{"name":"kube-apiserver-minikube","namespace":"kube-system","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/kube-apiserver-minikube","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"kube-apiserver","usage":{"cpu":"74m","memory":"188612Ki"}}]},{"metadata":{"name":"prometheus-server-7fbcdd878-whvrj","namespace":"default","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/prometheus-server-7fbcdd878-whvrj","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"prometheus-server-configmap-reload","usage":{"cpu":"0","memory":"1480Ki"}},{"name":"prometheus-server","usage":{"cpu":"12m","memory":"117292Ki"}}]},{"metadata":{"name":"metrics-server-exporter-6d97fb5cf7-fnqbb","namespace":"default","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/metrics-server-exporter-6d97fb5cf7-fnqbb","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"metrics-server-exporter","usage":{"cpu":"9m","memory":"16364Ki"}}]},{"metadata":{"name":"coredns-5c98db65d4-4n5m2","namespace":"kube-system","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/coredns-5c98db65d4-4n5m2","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"coredns","usage":{"cpu":"7m","memory":"9756Ki"}}]},{"metadata":{"name":"coredns-5c98db65d4-9txf7","namespace":"kube-system","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/coredns-5c98db65d4-9txf7","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"coredns","usage":{"cpu":"6m","memory":"9720Ki"}}]},{"metadata":{"name":"metrics-server-84bb785897-rp54x","namespace":"kube-system","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/metrics-server-84bb785897-rp54x","creationTimestamp":"2020-02-19T14:05:39Z"},"timestamp":"2020-02-19T14:05:00Z","window":"1m0s","containers":[{"name":"metrics-server","usage":{"cpu":"1m","memory":"10648Ki"}}]}]}

Service account, role and rolebinding is present as described in deploy.

There are no logs. Any ideas?

brodul commented 4 years ago

I have done some exploring on the issue. Added some logs to the app.py

WARNING:root:{'nodes': {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'nodes.metrics.k8s.io is forbidden: User "system:serviceaccount:default:metrics-server-exporter" cannot list resource "nodes" in API group "metrics.k8s.io" at the cluster scope', 'reason': 'Forbidden', 'details': {'group': 'metrics.k8s.io', 'kind': 'nodes'}, 'code': 403}, 'pods': {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'pods.metrics.k8s.io is forbidden: User "system:serviceaccount:default:metrics-server-exporter" cannot list resource "pods" in API group "metrics.k8s.io" at the cluster scope', 'reason': 'Forbidden', 'details': {'group': 'metrics.k8s.io', 'kind': 'pods'}, 'code': 403}}
WARNING:root:{'nodes': {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'nodes.metrics.k8s.io is forbidden: User "system:serviceaccount:default:metrics-server-exporter" cannot list resource "nodes" in API group "metrics.k8s.io" at the cluster scope', 'reason': 'Forbidden', 'details': {'group': 'metrics.k8s.io', 'kind': 'nodes'}, 'code': 403}, 'pods': {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'pods.metrics.k8s.io is forbidden: User "system:serviceaccount:default:metrics-server-exporter" cannot list resource "pods" in API group "metrics.k8s.io" at the cluster scope', 'reason': 'Forbidden', 'details': {'group': 'metrics.k8s.io', 'kind': 'pods'}, 'code': 403}}

The problem is that the permissions in the deploy and in helm are not set correctly. helm chart works if you deploy it in kube-system but it will otherwise not work.

brodul commented 4 years ago

Oh, I see. The helm chart is handling the namespace parameter purely. It would make more sense to use https://helm.sh/docs/chart_template_guide/builtin_objects/ Release.Namespace Would you except a PR that reworks the values and the chart slight to do things in a more Helm way?

deadc commented 4 years ago

Sorry for the delay in replying, we had holidays these days around here. and of course!, any contribution is welcome :smiley:

brodul commented 4 years ago

I hope you had some rest. :rocket: I opened another PR which makes the project useful for my case.