kubernetes-retired / heapster

[EOL] Compute Resource Usage Analysis and Monitoring of Container Clusters
Apache License 2.0
2.63k stars 1.25k forks source link

heapster not gathering Namespace and pods metrics #1120

Closed prune998 closed 6 years ago

prune998 commented 8 years ago

I have Heapster installed using the https://docs.openshift.org/latest/install_config/cluster_metrics.html directives. While this is working fine on my test platform, Heapster does not gather metrics from Namespaces/pods.

curl -ks -H "Authorization: Bearer xxxx" 'https://oc-master-eu-spare:8443/api/v1/proxy/namespaces/openshift-infra/services/https:heapster:/api/v1/model/nodes/'

[
  "redbud.xx",
  "purpleheart.xx",
  ...
 ]

curl -ks -H "Authorization: Bearer xxxx" 'https://oc-master-eu-spare:8443/api/v1/proxy/namespaces/openshift-infra/services/https:heapster:/api/v1/model/namespa
ces/'
[]

I turned the verbose logs on, and I can see Heapsrer is doing many GET or POST requests, that I tested, and which are all working : GET https://oc-master-eu-spare:8443/api/v1/namespaces?resourceVersion=0 GET https://oc-master-eu-spare:8443/api/v1/pods?resourceVersion=0 GET https://oc-master-eu-spare:8443/api/v1/watch/namespaces?timeoutSeconds=5 POST https://10.235.15.164:10250/stats/container/

As everything is working, I really think there is no issue with the SSL certs or authorization. Here is the logs at startup (without verbose)

I0408 11:47:17.723646   30341 heapster.go:60] heapster --source=kubernetes:https://oc-master-eu-spare:8443?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250 --tls_cert=/secrets/heapster.cert --tls_key=/sec
rets/heapster.key --tls_client_ca=/secrets/heapster.client-ca --allowed_users=system:master-proxy -port 8083 --sink=hawkular:https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&labelNodeId=nodename&caCert=
/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=xxx&filter=label(container_name:^system.slice.*|^user.slice) -logtostderr -alsologtostderr
I0408 11:47:17.723736   30341 heapster.go:61] Heapster version 1.1.0-beta1
I0408 11:47:17.724117   30341 configs.go:60] Using Kubernetes client with master "https://oc-master-eu-spare:8443" and version "v1"
I0408 11:47:17.724132   30341 configs.go:61] Using kubelet port 10250
I0408 11:47:17.766037   30341 driver.go:322] Initialised Hawkular Sink with parameters {_system https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&labelNodeId=nodename&caCert=/hawkular-cert/hawkular-metri
cs-ca.certificate&user=hawkular&pass=xxx&filter=label(container_name:^system.slice.*|^user.slice) 0xc820196640  5}
I0408 11:47:17.778519   30341 heapster.go:87] Starting with Hawkular-Metrics Sink
I0408 11:47:17.778542   30341 heapster.go:87] Starting with Metric Sink
I0408 11:47:17.787643   30341 heapster.go:166] Starting heapster on port 8083
I0408 11:47:35.000193   30341 manager.go:79] Scraping metrics start: 2016-04-08 11:47:00 +0000 UTC, end: 2016-04-08 11:47:30 +0000 UTC
I0408 11:47:36.822748   30341 manager.go:152] ScrapeMetrics: time: 1.822072675s size: 567
I0408 11:48:05.000242   30341 manager.go:79] Scraping metrics start: 2016-04-08 11:47:30 +0000 UTC, end: 2016-04-08 11:48:00 +0000 UTC
I0408 11:48:06.827318   30341 manager.go:152] ScrapeMetrics: time: 1.826906486s size: 568

Here are the versions :

OpenShift Master: v1.1.2-dirty Kubernetes Master: v1.2.0-alpha.4-851-g4a65fa1 Heapster version 1.1.0-beta1

for information, everything is working fine on my test install using versions : OpenShift Master: v1.1.3 Kubernetes Master: v1.2.0-origin Heapster version 1.1.0-beta1 (from same docker image)

codeb2cc commented 8 years ago

Are you using Docker 1.11.0? It's probably related to google/cadvisor#1206. Try to build kubernetes from head or just wait for the next release kubernetes/kubernetes/pull/24113.

teddymaef commented 8 years ago

I have the same problem. the /namespaces/ endpoint always returns []

fejta-bot commented 6 years ago

Issues go stale after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta. /lifecycle stale

DirectXMan12 commented 6 years ago

this is quite old. please re-open if you are still experiencing the issue