kubernetes-retired / heapster

[EOL] Compute Resource Usage Analysis and Monitoring of Container Clusters
Apache License 2.0
2.63k stars 1.25k forks source link

Heapster fails to get stats from kubelet #127

Closed a-robinson closed 9 years ago

a-robinson commented 9 years ago

Potentially related to the push of heapster 0.7. I've been seeing all heapster requests to my kubelets' /stats/ endpoints failing. I initially reported this as a Kubelet issue, but it looks possible that the new heapster added some sort of incompatibility with the kubelet.

The errors all look like this, and I expect it's very reproducible, as both clusters I've created since Friday have had these errors:

E0223 18:32:13.471559       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.173.100:10250/stats/default/kube-dns-0133o/70de926d-bb1a-11e4-b34e-42010af06d77/etcd - Got 'Internal Error: failed to retrieve cadvisor stats
': invalid character 'I' looking for beginning of value
E0223 18:32:33.378167       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.147.240:10250/stats/ - Got 'Internal Error: received empty response from "container info for \"/\""
': invalid character 'I' looking for beginning of value
vishh commented 9 years ago

Are you able to manually get stats using the URL mentioned in the logs?

wget http://10.240.173.100:10250/stats/default/kube-dns-0133o/70de926d-bb1a-11e4-b34e-42010af06d77/etcd .

I have seen such errors whenever cadvisor is not available.

On Mon, Feb 23, 2015 at 11:23 AM, Alex Robinson notifications@github.com wrote:

Potentially related to the push of heapster 0.7. I've been seeing all heapster requests to my kubelets' /stats/ endpoints failing. I initially reported this as a Kubelet issue https://github.com/GoogleCloudPlatform/kubernetes/issues/4717, but it looks possible that heapster is the one doing something wrong.

The errors all look like this, and I expect it's very reproducible, as both clusters I've created since Friday have had these errors:

E0223 18:32:13.471559 9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.173.100:10250/stats/default/kube-dns-0133o/70de926d-bb1a-11e4-b34e-42010af06d77/etcd - Got 'Internal Error: failed to retrieve cadvisor stats ': invalid character 'I' looking for beginning of value E0223 18:32:33.378167 9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.147.240:10250/stats/ - Got 'Internal Error: received empty response from "container info for \"/\"" ': invalid character 'I' looking for beginning of value

— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/heapster/issues/127.

a-robinson commented 9 years ago

Yeah, could definitely be due to cadvisor being missing.

root@kubernetes-minion-8hyo:/var/log# wget http://10.240.173.100:10250/stats/default/kube-dns-0133o/70de926d-bb1a-11e4-b34e-42010af06d77/etcd
--2015-02-23 19:33:23--  http://10.240.173.100:10250/stats/default/kube-dns-0133o/70de926d-bb1a-11e4-b34e-42010af06d77/etcd
Connecting to 10.240.173.100:10250... connected.
HTTP request sent, awaiting response... 500 Internal Server Error
2015-02-23 19:33:23 ERROR 500: Internal Server Error.
vishh commented 9 years ago

The kubelet also restarts often, so you might want to check kubelet health as well.

On Mon, Feb 23, 2015 at 11:34 AM, Alex Robinson notifications@github.com wrote:

Yeah, could definitely be due to cadvisor being missing.

root@kubernetes-minion-8hyo:/var/log# wget http://10.240.173.100:10250/stats/default/kube-dns-0133o/70de926d-bb1a-11e4-b34e-42010af06d77/etcd --2015-02-23 http://10.240.173.100:10250/stats/default/kube-dns-0133o/70de926d-bb1a-11e4-b34e-42010af06d77/etcd--2015-02-23 19:33:23-- http://10.240.173.100:10250/stats/default/kube-dns-0133o/70de926d-bb1a-11e4-b34e-42010af06d77/etcd Connecting to 10.240.173.100:10250... connected. HTTP request sent, awaiting response... 500 Internal Server Error 2015-02-23 19:33:23 ERROR 500: Internal Server Error.

— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/heapster/issues/127#issuecomment-75614639 .

a-robinson commented 9 years ago

It's happening every 10 seconds on every node, with no signs of kubelet restarts.

vishh commented 9 years ago

The '10 seconds' matches heapster's default polling duration. Can you look for cadvisor container restarts? Also can you post the entire heapster logs?

On Mon, Feb 23, 2015 at 11:36 AM, Alex Robinson notifications@github.com wrote:

It's happening every 10 seconds on every node, with no signs of kubelet restarts.

— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/heapster/issues/127#issuecomment-75615058 .

a-robinson commented 9 years ago

The cadvisor container has been up for 14 hours, since the cluster was created. The cadvisor logs (from docker logs) are empty. Docker's health checks for it are succeeding: time="2015-02-23T19:43:17Z" level="info" msg="-job container_inspect(a3fd307f497b2418822dd83bfbffd979cfea187ad8ede24c7734dd6af1f3def2) = OK (0)"

a-robinson commented 9 years ago

Heapster logs:

[...]
E0223 18:32:23.447607       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.156.10:10250/stats/default/wordpress/9c8d9846-bb1a-11e4-b34e-42010af06d77/wordpress - Got 'Pod does not exist
': invalid character 'P' looking for beginning of value
E0223 18:32:23.447615       9 kube.go:158] failed to get stats for container "wordpress"/"default" in pod "wordpress"
E0223 18:32:23.447653       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.156.10:10250/stats/default/wordpress/9c8d9846-bb1a-11e4-b34e-42010af06d77/wordpress - Got 'Pod does not exist
': invalid character 'P' looking for beginning of value
E0223 18:32:23.447675       9 kube.go:158] failed to get stats for container "wordpress"/"default" in pod "wordpress"
E0223 18:32:23.447721       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.156.10:10250/stats/default/wordpress/9c8d9846-bb1a-11e4-b34e-42010af06d77/wordpress - Got 'Pod does not exist
': invalid character 'P' looking for beginning of value
E0223 18:32:23.447728       9 kube.go:158] failed to get stats for container "wordpress"/"default" in pod "wordpress"
E0223 18:32:23.451652       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.173.100:10250/stats/ - Got 'Internal Error: received empty response from "container info for \"/\""
': invalid character 'I' looking for beginning of value
E0223 18:32:23.457987       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.137.215:10250/stats/ - Got 'Internal Error: received empty response from "container info for \"/\""
': invalid character 'I' looking for beginning of value
E0223 18:32:23.459743       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.147.240:10250/stats/ - Got 'Internal Error: received empty response from "container info for \"/\""
': invalid character 'I' looking for beginning of value
E0223 18:32:32.777992       9 reflector.go:85] Failed to list *api.Pod: Get http://10.0.0.1:80/api/v1beta1/pods?fields=Status.Host%21%3D&namespace=: read tcp 10.0.0.1:80: connection reset by peer
E0223 18:32:33.367214       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.156.10:10250/stats/default/wordpress/9c8d9846-bb1a-11e4-b34e-42010af06d77/wordpress - Got 'Pod does not exist
': invalid character 'P' looking for beginning of value
E0223 18:32:33.367853       9 kube.go:158] failed to get stats for container "wordpress"/"default" in pod "wordpress"
E0223 18:32:33.368369       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.156.10:10250/stats/default/wordpress/9c8d9846-bb1a-11e4-b34e-42010af06d77/wordpress - Got 'Pod does not exist
': invalid character 'P' looking for beginning of value
E0223 18:32:33.368777       9 kube.go:158] failed to get stats for container "wordpress"/"default" in pod "wordpress"
E0223 18:32:33.371077       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.156.10:10250/stats/default/wordpress/9c8d9846-bb1a-11e4-b34e-42010af06d77/wordpress - Got 'Pod does not exist
': invalid character 'P' looking for beginning of value
E0223 18:32:33.371386       9 kube.go:158] failed to get stats for container "wordpress"/"default" in pod "wordpress"
E0223 18:32:33.371813       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.156.10:10250/stats/default/wordpress/9c8d9846-bb1a-11e4-b34e-42010af06d77/wordpress - Got 'Pod does not exist
': invalid character 'P' looking for beginning of value
E0223 18:32:33.372137       9 kube.go:158] failed to get stats for container "wordpress"/"default" in pod "wordpress"
E0223 18:32:33.372444       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.156.10:10250/stats/default/wordpress/9c8d9846-bb1a-11e4-b34e-42010af06d77/wordpress - Got 'Pod does not exist
': invalid character 'P' looking for beginning of value
E0223 18:32:33.372640       9 kube.go:158] failed to get stats for container "wordpress"/"default" in pod "wordpress"
E0223 18:32:33.372765       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.156.10:10250/stats/default/wordpress/9c8d9846-bb1a-11e4-b34e-42010af06d77/wordpress - Got 'Pod does not exist
': invalid character 'P' looking for beginning of value
E0223 18:32:33.372921       9 kube.go:158] failed to get stats for container "wordpress"/"default" in pod "wordpress"
E0223 18:32:33.373051       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.156.10:10250/stats/default/wordpress/9c8d9846-bb1a-11e4-b34e-42010af06d77/wordpress - Got 'Pod does not exist
': invalid character 'P' looking for beginning of value
E0223 18:32:33.373191       9 kube.go:158] failed to get stats for container "wordpress"/"default" in pod "wordpress"
E0223 18:32:33.374618       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.173.100:10250/stats/ - Got 'Internal Error: received empty response from "container info for \"/\""
': invalid character 'I' looking for beginning of value
E0223 18:32:33.377418       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.137.215:10250/stats/ - Got 'Internal Error: received empty response from "container info for \"/\""
': invalid character 'I' looking for beginning of value
E0223 18:32:33.378167       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.147.240:10250/stats/ - Got 'Internal Error: received empty response from "container info for \"/\""
': invalid character 'I' looking for beginning of value
a-robinson commented 9 years ago

kubelet logs:

[...]
I0223 19:44:33.373258    4813 server.go:594] POST /stats/: (597.668µs) 500
goroutine 174444 [running]:
github.com/GoogleCloudPlatform/kubernetes/pkg/httplog.(*respLogger).WriteHeader(0xc2081beba0, 0x1f4)
        /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/httplog/log.go:188 +0x9a
net/http.Error(0x7f8fca61f748, 0xc2081beba0, 0xc2081f6b40, 0x47, 0x1f4)
        /usr/src/go/src/net/http/server.go:1274 +0x9a
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(*Server).error(0xc2080e9100, 0x7f8fca61f748, 0xc2081beba0, 0x7f8fca60fa80, 0xc20827a090)/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:642 +0x6ff
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(*Server).handleStats(0xc2080e9100, 0x7f8fca61f748, 0xc2081beba0, 0xc208295520)
        /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:288 +0x46
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.*Server.(github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.handleStats)·fm(0x7f8fca61f748, 0xc2081beba0, 0xc208295520)
        /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:101 +0x45
net/http.HandlerFunc.ServeHTTP(0xc20807a760, 0x7f8fca61f748, 0xc2081beba0, 0xc208295520)
        /usr/src/go/src/net/http/server.go:1265 +0x41
net/http.(*ServeMux).ServeHTTP(0xc2080aedb0, 0x7f8fca61f748, 0xc2081beba0, 0xc208295520)
        /usr/src/go/src/net/http/server.go:1541 +0x17d
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(*Server).ServeHTTP(0xc2080e90e0, 0x7f8fca61f710, 0xc208116c80, 0xc208295520)
        /go/src/github.com/GoogleCloudPlatform/kubernetes/_
I0223 19:44:33.631877    4813 server.go:594] GET /api/v1beta1/podInfo?podID=elasticsearch-logging-controller-oh43e&podNamespace=default: (7.373463ms) 0
I0223 19:44:33.649061    4813 server.go:594] GET /api/v1beta1/podInfo?podID=kibana-logging-controller-gziey&podNamespace=default: (13.679069ms) 0
I0223 19:44:33.654014    4813 server.go:594] GET /api/v1beta1/podInfo?podID=monitoring-influx-grafana-controller-ls6k1&podNamespace=default: (12.459173ms) 0
I0223 19:44:33.670365    4813 server.go:594] GET /api/v1beta1/podInfo?podID=monitoring-heapster-controller-fplln&podNamespace=default: (17.910493ms) 0
I0223 19:44:33.677777    4813 server.go:594] GET /api/v1beta1/podInfo?podID=kube-dns-0133o&podNamespace=default: (25.190853ms) 0
I0223 19:44:34.779222    4813 server.go:594] GET /healthz: (661.37µs) 0
I0223 19:44:37.815668    4813 file.go:209] Got pod from file "/etc/kubernetes/manifests/cadvisor.manifest": file-6bb810db-kubernetes-minion-8hyo.cadvisor-agent (57d25f8c3da447436ec35b3ce7097bb5)
I0223 19:44:37.816141    4813 file.go:209] Got pod from file "/etc/kubernetes/manifests/fluentd-es.manifest": file-8cd71177-kubernetes-minion-8hyo.fluentd-to-elasticsearch (1ea443fc8fed5e1d27d6916f2da1d673)
I0223 19:44:38.725653    4813 server.go:594] GET /api/v1beta1/podInfo?podID=elasticsearch-logging-controller-oh43e&podNamespace=default: (32.014506ms) 0
I0223 19:44:38.741443    4813 server.go:594] GET /api/v1beta1/podInfo?podID=kibana-logging-controller-gziey&podNamespace=default: (29.460761ms) 0
I0223 19:44:38.744238    4813 server.go:594] GET /api/v1beta1/podInfo?podID=kube-dns-0133o&podNamespace=default: (50.848013ms) 0
I0223 19:44:38.746700    4813 server.go:594] GET /api/v1beta1/podInfo?podID=monitoring-heapster-controller-fplln&podNamespace=default: (26.532633ms) 0
I0223 19:44:38.748346    4813 server.go:594] GET /api/v1beta1/podInfo?podID=monitoring-influx-grafana-controller-ls6k1&podNamespace=default: (33.794581ms) 0
I0223 19:44:43.492990    4813 server.go:594] POST /stats/: (2.939874ms) 500
goroutine 174444 [running]:
github.com/GoogleCloudPlatform/kubernetes/pkg/httplog.(*respLogger).WriteHeader(0xc208169680, 0x1f4)
        /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/httplog/log.go:188 +0x9a
net/http.Error(0x7f8fca61f748, 0xc208169680, 0xc208228050, 0x47, 0x1f4)
        /usr/src/go/src/net/http/server.go:1274 +0x9a
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(*Server).error(0xc2080e9100, 0x7f8fca61f748, 0xc208169680, 0x7f8fca60fa80, 0xc20827a100)
        /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:117 +0x124
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(*Server).serveStats(0xc2080e9100, 0x7f8fca61f748, 0xc208169680, 0xc208295a00)
        /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:642 +0x6ff
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(*Server).handleStats(0xc2080e9100, 0x7f8fca61f748, 0xc208169680, 0xc208295a00)
        /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:288 +0x46
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.*Server.(github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.handleStats)·fm(0x7f8fca61f748, 0xc208169680, 0xc208295a00)
        /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:101 +0x45
net/http.HandlerFunc.ServeHTTP(0xc20807a760, 0x7f8fca61f748, 0xc208169680, 0xc208295a00)
        /usr/src/go/src/net/http/server.go:1265 +0x41
net/http.(*ServeMux).ServeHTTP(0xc2080aedb0, 0x7f8fca61f748, 0xc208169680, 0xc208295a00)
        /usr/src/go/src/net/http/server.go:1541 +0x17d
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(*Server).ServeHTTP(0xc2080e90e0, 0x7f8fca61f710, 0xc208117ae0, 0xc208295a00)
        /go/src/github.com/GoogleCloudPlatform/kubernetes/_
I0223 19:44:43.782783    4813 server.go:594] GET /api/v1beta1/podInfo?podID=elasticsearch-logging-controller-oh43e&podNamespace=default: (8.38694ms) 0
I0223 19:44:43.801743    4813 server.go:594] GET /api/v1beta1/podInfo?podID=kibana-logging-controller-gziey&podNamespace=default: (27.167657ms) 0
I0223 19:44:43.811035    4813 server.go:594] GET /api/v1beta1/podInfo?podID=monitoring-heapster-controller-fplln&podNamespace=default: (23.502468ms) 0
I0223 19:44:43.813873    4813 server.go:594] GET /api/v1beta1/podInfo?podID=monitoring-influx-grafana-controller-ls6k1&podNamespace=default: (26.454084ms) 0
I0223 19:44:43.815521    4813 server.go:594] GET /api/v1beta1/podInfo?podID=kube-dns-0133o&podNamespace=default: (28.229048ms) 0
I0223 19:44:45.335495    4813 server.go:594] GET /healthz: (651.195µs) 0
E0223 19:44:47.031101    4813 reflector.go:85] Failed to list *api.Pod: Get https://10.240.109.119:6443/api/v1beta1/pods?fields=Status.Host%3Dkubernetes-minion-8hyo.c.arob-frogger.internal&namespace=: dial tcp 10.240.109.119:6443: i/o timeout
E0223 19:44:47.031219    4813 reflector.go:85] Failed to list *api.Service: Get https://10.240.109.119:6443/api/v1beta1/services?namespace=: dial tcp 10.240.109.119:6443: i/o timeout
I0223 19:44:48.864251    4813 server.go:594] GET /api/v1beta1/podInfo?podID=elasticsearch-logging-controller-oh43e&podNamespace=default: (24.75714ms) 0
I0223 19:44:48.866261    4813 server.go:594] GET /api/v1beta1/podInfo?podID=kibana-logging-controller-gziey&podNamespace=default: (26.545063ms) 0
I0223 19:44:48.899430    4813 server.go:594] GET /api/v1beta1/podInfo?podID=kube-dns-0133o&podNamespace=default: (13.145955ms) 0
I0223 19:44:48.908430    4813 server.go:594] GET /api/v1beta1/podInfo?podID=monitoring-heapster-controller-fplln&podNamespace=default: (12.303541ms) 0
I0223 19:44:48.920184    4813 server.go:594] GET /api/v1beta1/podInfo?podID=monitoring-influx-grafana-controller-ls6k1&podNamespace=default: (23.910287ms) 0
vishh commented 9 years ago

This must have been fixed on head by #4784. Can you recreate a cluster from HEAD and see if the issue is fixed?

a-robinson commented 9 years ago

Sorry for the delay - I'd love to test this, but the wi-fi here isn't good enough for kube-up to work for me :/ I probably won't be able to until tomorrow

vishh commented 9 years ago

The heapster POST issue has been fixed on heapster HEAD. I will integrate that into kube soon.

Can we stop dumping stack traces in kubelet? Does it really help debug any issues?

On Thu, Feb 26, 2015 at 12:13 PM, Alex Robinson notifications@github.com wrote:

Sorry for the delay - I'd love to test this, but the wi-fi here isn't good enough for kube-up to work for me :/ I probably won't be able to until tomorrow

— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/heapster/issues/127#issuecomment-76260759 .

DreadPirateShawn commented 9 years ago

For what it's worth, I'm currently seeing the same issue, using the latest heapster github code.

Just to make sure, here's the latest pull and restart that I performed:

20:58 kube_devstack highland@kube-master1:~/heapster$ git pull origin master
remote: Counting objects: 6, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 6 (delta 1), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (6/6), done.
From https://github.com/GoogleCloudPlatform/heapster
 * branch            master     -> FETCH_HEAD
Updating d5098a9..ec1c526
Fast-forward
 sources/cadvisor_test.go |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

21:06 kube_devstack highland@kube-master1:~/heapster$ deploy/kube.sh restart
heapster pods have been removed.
heapster pods have been setup

I see that the pods were indeed recreated, however I'm still seeing the POST to /stats/ every 10 seconds:

[...]
/logs/kube-minion3/INFO-174305.28825:I0226 21:08:50.271049   28825 server.go:362] POST /stats/: (323.296µs) 500
/logs/kube-minion3/INFO-174305.28825:I0226 21:09:00.278409   28825 server.go:362] POST /stats/: (411.688µs) 500
/logs/kube-minion3/INFO-174305.28825:I0226 21:09:10.271291   28825 server.go:362] POST /stats/: (521.926µs) 500
/logs/kube-minion3/INFO-174305.28825:I0226 21:09:20.272693   28825 server.go:362] POST /stats/: (505.051µs) 500
/logs/kube-minion3/INFO-174305.28825:I0226 21:09:30.265936   28825 server.go:362] POST /stats/default/monitoring-heapster-controller-2gdny/6b8cfbc5-bdfb-11e4-848e-00505629c58f/heapster: (2.417704ms) 500
/logs/kube-minion3/INFO-174305.28825:I0226 21:09:30.267541   28825 server.go:362] POST /stats/default/monitoring-heapster-controller-2gdny/6b8cfbc5-bdfb-11e4-848e-00505629c58f/heapster: (3.741048ms) 500
/logs/kube-minion3/INFO-174305.28825:I0226 21:09:30.268122   28825 server.go:362] POST /stats/: (170.12µs) 500

And from the kubectl logs:

21:16 kube_devstack highland@kube-minion1:~$ kubectl.sh log monitoring-heapster-controller-2gdny | tail
2015-02-26T21:17:20.268190019Z E0226 21:17:20.268150       5 kube.go:158] failed to get stats for container "influxdb"/"default" in pod "monitoring-influx-grafana-controller-xxvi7"
2015-02-26T21:17:20.272222969Z E0226 21:17:20.271986       5 kubelet.go:80] failed to get stats from kubelet url: http://10.0.0.4:10250/stats/default/monitoring-influx-grafana-controller-xxvi7/6bfa5435-bdfb-11e4-848e-00505629c58f/influxdb - Got 'Internal Error: received empty response from "Docker container info for \"35c432aef492333df08db6c44eb42c4977392c4bdef6c0b33182730a82480a89\""
2015-02-26T21:17:20.272222969Z ': invalid character 'I' looking for beginning of value
2015-02-26T21:17:20.272222969Z E0226 21:17:20.271998       5 kube.go:158] failed to get stats for container "influxdb"/"default" in pod "monitoring-influx-grafana-controller-xxvi7"
2015-02-26T21:17:20.273595492Z E0226 21:17:20.272685       5 kubelet.go:80] failed to get stats from kubelet url: http://10.0.0.7:10250/stats/ - Got 'Internal Error: received empty response from "container info for \"/\""
2015-02-26T21:17:20.273595492Z ': invalid character 'I' looking for beginning of value
2015-02-26T21:17:20.273626242Z E0226 21:17:20.273016       5 kubelet.go:80] failed to get stats from kubelet url: http://10.0.0.4:10250/stats/ - Got 'Internal Error: received empty response from "container info for \"/\""
2015-02-26T21:17:20.273626242Z ': invalid character 'I' looking for beginning of value
2015-02-26T21:17:20.273626242Z E0226 21:17:20.273319       5 kubelet.go:80] failed to get stats from kubelet url: http://10.0.0.3:10250/stats/ - Got 'Internal Error: received empty response from "container info for \"/\""
2015-02-26T21:17:20.273626242Z ': invalid character 'I' looking for beginning of value

Is there more that I could check? (Still finding my way around the logs.)

EDIT: Granted, it's possible that cadvisor is not yet running properly on my system -- that's what I'm looking into now -- but the POST logs still seem suspect, given this ticket.

vishh commented 9 years ago

deploy/kube.sh does not rebuild heapster docker images. I have pushed a test image 'kubernetes/heapster:test'. Can you try again using that image?

On Thu, Feb 26, 2015 at 1:18 PM, Shawn Falkner-Horine < notifications@github.com> wrote:

For what it's worth, I'm currently seeing the same issue, using the latest heapster github code.

Just to make sure, here's the latest pull and restart that I performed:

20:58 kube_devstack highland@kube-master1:~/heapster$ git pull origin master remote: Counting objects: 6, done. remote: Compressing objects: 100% (6/6), done. remote: Total 6 (delta 1), reused 0 (delta 0), pack-reused 0 Unpacking objects: 100% (6/6), done. From https://github.com/GoogleCloudPlatform/heapster

  • branch master -> FETCH_HEAD Updating d5098a9..ec1c526 Fast-forward sources/cadvisor_test.go | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)

21:06 kube_devstack highland@kube-master1:~/heapster$ deploy/kube.sh restart heapster pods have been removed. heapster pods have been setup

I see that the pods were indeed recreated, however I'm still seeing the POST to /stats/ every 10 seconds:

[...] /logs/kube-minion3/INFO-174305.28825:I0226 21:08:50.271049 28825 server.go:362] POST /stats/: (323.296µs) 500 /logs/kube-minion3/INFO-174305.28825:I0226 21:09:00.278409 28825 server.go:362] POST /stats/: (411.688µs) 500 /logs/kube-minion3/INFO-174305.28825:I0226 21:09:10.271291 28825 server.go:362] POST /stats/: (521.926µs) 500 /logs/kube-minion3/INFO-174305.28825:I0226 21:09:20.272693 28825 server.go:362] POST /stats/: (505.051µs) 500 /logs/kube-minion3/INFO-174305.28825:I0226 21:09:30.265936 28825 server.go:362] POST /stats/default/monitoring-heapster-controller-2gdny/6b8cfbc5-bdfb-11e4-848e-00505629c58f/heapster: (2.417704ms) 500 /logs/kube-minion3/INFO-174305.28825:I0226 21:09:30.267541 28825 server.go:362] POST /stats/default/monitoring-heapster-controller-2gdny/6b8cfbc5-bdfb-11e4-848e-00505629c58f/heapster: (3.741048ms) 500 /logs/kube-minion3/INFO-174305.28825:I0226 21:09:30.268122 28825 server.go:362] POST /stats/: (170.12µs) 500

And from the kubectl logs:

21:16 kube_devstack highland@kube-minion1:~$ kubectl.sh log monitoring-heapster-controller-2gdny | tail 2015-02-26T21:17:20.268190019Z E0226 21:17:20.268150 5 kube.go:158] failed to get stats for container "influxdb"/"default" in pod "monitoring-influx-grafana-controller-xxvi7" 2015-02-26T21:17:20.272222969Z E0226 21:17:20.271986 5 kubelet.go:80] failed to get stats from kubelet url: http://10.0.0.4:10250/stats/default/monitoring-influx-grafana-controller-xxvi7/6bfa5435-bdfb-11e4-848e-00505629c58f/influxdb - Got 'Internal Error: received empty response from "Docker container info for \"35c432aef492333df08db6c44eb42c4977392c4bdef6c0b33182730a82480a89\"" 2015-02-26T21:17:20.272222969Z ': invalid character 'I' looking for beginning of value 2015-02-26T21:17:20.272222969Z E0226 21:17:20.271998 5 kube.go:158] failed to get stats for container "influxdb"/"default" in pod "monitoring-influx-grafana-controller-xxvi7" 2015-02-26T21:17:20.273595492Z E0226 21:17:20.272685 5 kubelet.go:80] failed to get stats from kubelet url: http://10.0.0.7:10250/stats/ - Got 'Internal Error: received empty response from "container info for \"/\"" 2015-02-26T21:17:20.273595492Z ': invalid character 'I' looking for beginning of value 2015-02-26T21:17:20.273626242Z E0226 21:17:20.273016 5 kubelet.go:80] failed to get stats from kubelet url: http://10.0.0.4:10250/stats/ - Got 'Internal Error: received empty response from "container info for \"/\"" 2015-02-26T21:17:20.273626242Z ': invalid character 'I' looking for beginning of value 2015-02-26T21:17:20.273626242Z E0226 21:17:20.273319 5 kubelet.go:80] failed to get stats from kubelet url: http://10.0.0.3:10250/stats/ - Got 'Internal Error: received empty response from "container info for \"/\"" 2015-02-26T21:17:20.273626242Z ': invalid character 'I' looking for beginning of value

Is there more that I could check? (Still finding my way around the logs.)

— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/heapster/issues/127#issuecomment-76275534 .

DreadPirateShawn commented 9 years ago

Ah! Thanks, I didn't realize where the image was specified -- I see what you mean, now. Even better, I tried your test image, and I do see the POST actions change to GET:

[...]
/logs/kube-minion3/INFO-174305.28825 22:51:02.629496   16718 server.go:362] POST /stats/: (172.947µs) 500
/logs/kube-minion3/INFO-174305.28825 22:51:12.624493   16718 server.go:362] POST /stats/: (433.335µs) 500
/logs/kube-minion3/INFO-174305.28825 22:51:22.623663   16718 server.go:362] POST /stats/default/monitoring-influx-grafana-controller-182tv/db5fa8f2-be03-11e4-848e-00505629c58f/influxdb: (8.611277ms) 500
/logs/kube-minion3/INFO-174305.28825 22:51:22.627992   16718 server.go:362] POST /stats/default/monitoring-influx-grafana-controller-182tv/db5fa8f2-be03-11e4-848e-00505629c58f/influxdb: (12.700853ms) 500
/logs/kube-minion3/INFO-174305.28825 22:51:22.628858   16718 server.go:362] POST /stats/: (237.14µs) 500
/logs/kube-minion3/INFO-174305.28825 22:51:32.615554   16718 server.go:362] POST /stats/: (449.188µs) 500
/logs/kube-minion3/INFO-174305.28825 22:52:19.700102   16718 server.go:362] GET /stats/default/monitoring-heapster-controller-zjpez/06983ddc-be0a-11e4-848e-00505629c58f/heapster: (6.570953ms) 500
/logs/kube-minion3/INFO-174305.28825 22:52:19.707005   16718 server.go:362] GET /stats/default/monitoring-heapster-controller-zjpez/06983ddc-be0a-11e4-848e-00505629c58f/heapster: (13.167025ms) 500
/logs/kube-minion3/INFO-174305.28825 22:52:19.708450   16718 server.go:362] GET /stats/: (696.24µs) 500
/logs/kube-minion3/INFO-174305.28825 22:52:29.702688   16718 server.go:362] GET /stats/default/monitoring-influx-grafana-controller-i4hx9/06e12bd6-be0a-11e4-848e-00505629c58f/influxdb: (9.018229ms) 500
/logs/kube-minion3/INFO-174305.28825 22:52:29.709717   16718 server.go:362] GET /stats/default/monitoring-influx-grafana-controller-i4hx9/06e12bd6-be0a-11e4-848e-00505629c58f/influxdb: (16.410284ms) 500
/logs/kube-minion3/INFO-174305.28825 22:52:29.711006   16718 server.go:362] GET /stats/: (340.775µs) 500

I'm pretty sure now that my other issue is cAdvisor, since I'm still seeing the "invalid character 'I' looking for beginning of value" errors, but taking this GET/POST aspect off the table is definitely helpful.

vishh commented 9 years ago

There was an issue in kubelet which affected access to cadvisor container. That got fixed very recently (yesterday maybe). Running a cluster from head might solve that problem.

On Thu, Feb 26, 2015 at 2:57 PM, Shawn Falkner-Horine < notifications@github.com> wrote:

Ah! Thanks, I didn't realize where the image was specified -- I see what you mean, now. Even better, I tried your test image, and I do see the POST actions change to GET:

[...] /logs/kube-minion3/INFO-174305.28825 22:51:02.629496 16718 server.go:362] POST /stats/: (172.947µs) 500 /logs/kube-minion3/INFO-174305.28825 22:51:12.624493 16718 server.go:362] POST /stats/: (433.335µs) 500 /logs/kube-minion3/INFO-174305.28825 22:51:22.623663 16718 server.go:362] POST /stats/default/monitoring-influx-grafana-controller-182tv/db5fa8f2-be03-11e4-848e-00505629c58f/influxdb: (8.611277ms) 500 /logs/kube-minion3/INFO-174305.28825 22:51:22.627992 16718 server.go:362] POST /stats/default/monitoring-influx-grafana-controller-182tv/db5fa8f2-be03-11e4-848e-00505629c58f/influxdb: (12.700853ms) 500 /logs/kube-minion3/INFO-174305.28825 22:51:22.628858 16718 server.go:362] POST /stats/: (237.14µs) 500 /logs/kube-minion3/INFO-174305.28825 22:51:32.615554 16718 server.go:362] POST /stats/: (449.188µs) 500 /logs/kube-minion3/INFO-174305.28825 22:52:19.700102 16718 server.go:362] GET /stats/default/monitoring-heapster-controller-zjpez/06983ddc-be0a-11e4-848e-00505629c58f/heapster: (6.570953ms) 500 /logs/kube-minion3/INFO-174305.28825 22:52:19.707005 16718 server.go:362] GET /stats/default/monitoring-heapster-controller-zjpez/06983ddc-be0a-11e4-848e-00505629c58f/heapster: (13.167025ms) 500 /logs/kube-minion3/INFO-174305.28825 22:52:19.708450 16718 server.go:362] GET /stats/: (696.24µs) 500 /logs/kube-minion3/INFO-174305.28825 22:52:29.702688 16718 server.go:362] GET /stats/default/monitoring-influx-grafana-controller-i4hx9/06e12bd6-be0a-11e4-848e-00505629c58f/influxdb: (9.018229ms) 500 /logs/kube-minion3/INFO-174305.28825 22:52:29.709717 16718 server.go:362] GET /stats/default/monitoring-influx-grafana-controller-i4hx9/06e12bd6-be0a-11e4-848e-00505629c58f/influxdb: (16.410284ms) 500 /logs/kube-minion3/INFO-174305.28825 22:52:29.711006 16718 server.go:362] GET /stats/: (340.775µs) 500

I'm pretty sure now that my other issue is cAdvisor, since I'm still seeing the "invalid character 'I' looking for beginning of value" errors, but taking this GET/POST aspect off the table is definitely helpful.

— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/heapster/issues/127#issuecomment-76293563 .

a-robinson commented 9 years ago

This appears to be fixed at head. Thanks Vish!

idoshamun commented 9 years ago

I keep getting:

E0418 16:46:17.721174       7 kubelet.go:85] failed to get stats from kubelet url: http://172.18.0.13:10250/stats/default/monitoring-influx-grafana-controller-gkukt/6889f02a-e5e8-11e4-b87a-000d3a30177a/grafana - Get http://172.18.0.13:10250/stats/default/monitoring-influx-grafana-controller-gkukt/6889f02a-e5e8-11e4-b87a-000d3a30177a/grafana: malformed HTTP response "\x15\x03\x01\x00\x02\x02"

When I wget this link it downloads a binary file.

vishh commented 9 years ago

What version of kubernetes are you running? I am guessing v0.15.0? If so, can you try using the configs from here?

On Sat, Apr 18, 2015 at 12:55 PM, Ido Shamun notifications@github.com wrote:

I keep getting:

E0418 16:46:17.721174 7 kubelet.go:85] failed to get stats from kubelet url: http://172.18.0.13:10250/stats/default/monitoring-influx-grafana-controller-gkukt/6889f02a-e5e8-11e4-b87a-000d3a30177a/grafana - Get http://172.18.0.13:10250/stats/default/monitoring-influx-grafana-controller-gkukt/6889f02a-e5e8-11e4-b87a-000d3a30177a/grafana: malformed HTTP response "\x15\x03\x01\x00\x02\x02"

When I wget this link it downloads a binary file.

— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/heapster/issues/127#issuecomment-94180860 .

idoshamun commented 9 years ago

Fixed it. Thanks