Closed JunejaTung closed 6 years ago
cc @thucatebay
Here's the query that Grafana uses to fetch the list of nodes:
select distinct(hostname) from \"memory/limit_bytes_gauge\" where time > now() - 10m
for namespace:
select distinct(pod_namespace) from \"uptime_ms_cumulative\" where time > now() - 1m
for pod:
select distinct(pod_name) FROM \"uptime_ms_cumulative\" where pod_namespace =~ /$namespace/ and time > now() - 1m
and container:
select distinct(container_name) from \"uptime_ms_cumulative\" where pod_name =~ /$pod/ and \"pod_namespace\" =~ /$namespace/ and time > now() - 1m
You can change the time filter to go back further and pick up more data. These are defined under "Templating" section of the dashboard settings menu.
I'll look into whether it's possible to use the selected time filter instead of hardcoding.
@thucatebay to now, I config the query in Grafana not very good! I make some chages to the node 172.27.8.212
, make it ready
. then find some interesting things . i doubt there may be bug when has node notready
in kubelets or no up containers, in this scene, heapster can't get metrics for the ready
nodes after that notready
node.
[root@wlan-cloudserver31 influxdb-test]#
[root@wlan-cloudserver31 influxdb-test]# kubectl get nodes
NAME LABELS STATUS
172.27.8.211 kubernetes.io/hostname=172.27.8.211 Ready
172.27.8.212 kubernetes.io/hostname=172.27.8.212 Ready
172.27.8.214 kubernetes.io/hostname=172.27.8.214 Ready
[root@wlan-cloudserver31 influxdb-test]# kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE NODE
default busybox 1/1 Running 2 4d 172.27.8.211
default redis-master-rze58 1/1 Running 2 6d 172.27.8.214
kube-system heapster-zcd76 1/1 Running 0 4d 172.27.8.214
kube-system influxdb-grafana-h9rnx 2/2 Running 0 6d 172.27.8.214
kube-system kube-dns-v9-u07x6 4/4 Running 0 2d 172.27.8.211
[root@wlan-cloudserver31 influxdb-test]#
the logs for container heapster.
I1104 11:53:30.007492 1 kubelet.go:99] url: "http://172.27.8.211:10255/stats/default/busybox/d22d309b-82c4-11e5-a8b4-fa163e77e286/busybox", body: "{\"num_stats\":60,\"start\":\"2015-11-04T11:53:25Z\",\"end\":\"2015-11-04T11:53:30Z\"}", data: {ContainerReference:{Name:/system.slice/docker-e1a52f29c1bd116ecf599527f6c0d9e08b625b6c4edc8f69d3d1568d5299e382.scope Aliases:[k8s_busybox.d1c8ce40_busybox_default_d22d309b-82c4-11e5-a8b4-fa163e77e286_13b42095 e1a52f29c1bd116ecf599527f6c0d9e08b625b6c4edc8f69d3d1568d5299e382] Namespace:docker} Subcontainers:[] Spec:{CreationTime:2015-11-04 11:23:30.664343161 +0000 UTC Labels:map[io.kubernetes.pod.name:default/busybox] HasCpu:true Cpu:{Limit:2 MaxLimit:0 Mask:0-7} HasMemory:true Memory:{Limit:18446744073709551615 Reservation:0 SwapLimit:18446744073709551615} HasNetwork:true HasFilesystem:false HasDiskIo:true HasCustomMetrics:false CustomMetrics:[]} Stats:[]}
I1104 11:53:30.054508 1 manager.go:175] completed scraping data from sources. Errors: []
I1104 11:53:35.000745 1 manager.go:162] starting to scrape data from sources start: 2015-11-04 11:53:30 +0000 UTC end: 2015-11-04 11:53:35 +0000 UTC
I1104 11:53:35.000915 1 manager.go:103] attempting to get data from source "Kube Node Metrics Source"
I1104 11:53:35.000940 1 manager.go:103] attempting to get data from source "Kube Events Source"
I1104 11:53:35.001114 1 kube.go:79] Only have PublicIP 172.27.8.214 for node 172.27.8.214, so using it for InternalIP
I1104 11:53:35.001104 1 kube_events.go:216] Fetched list of events from the master
I1104 11:53:35.001188 1 kube_events.go:217] []
I1104 11:53:35.001279 1 kube.go:79] Only have PublicIP 172.27.8.211 for node 172.27.8.211, so using it for InternalIP
I1104 11:53:35.001307 1 kube.go:79] Only have PublicIP 172.27.8.212 for node 172.27.8.212, so using it for InternalIP
I1104 11:53:35.001275 1 manager.go:103] attempting to get data from source "Kube Pods Source"
I1104 11:53:35.001323 1 kube_nodes.go:126] Fetched list of nodes from the master
I1104 11:53:35.001367 1 kube.go:79] Only have PublicIP 172.27.8.214 for node 172.27.8.214, so using it for InternalIP
I1104 11:53:35.001409 1 kube.go:79] Only have PublicIP 172.27.8.211 for node 172.27.8.211, so using it for InternalIP
I1104 11:53:35.001450 1 kube.go:79] Only have PublicIP 172.27.8.212 for node 172.27.8.212, so using it for InternalIP
I1104 11:53:35.001546 1 pods.go:152] selected pods from api server [{pod:0xc20830e5d0 nodeInfo:0xc2082bc140 namespace:0xc2081400e8} {pod:0xc20830e7c0 nodeInfo:0xc2082bc180 namespace:0xc2081400e8} {pod:0xc20830e000 nodeInfo:0xc2082bc1c0 namespace:0xc208140000} {pod:0xc20830e1f0 nodeInfo:0xc2082bc200 namespace:0xc208140000} {pod:0xc20830e3e0 nodeInfo:0xc2082bc240 namespace:0xc2081400e8}]
I1104 11:53:35.001927 1 kubelet.go:110] about to query kubelet using url: "http://172.27.8.214:10255/stats/kube-system/influxdb-grafana-h9rnx/7801146f-81d8-11e5-a8b4-fa163e77e286/influxdb"
I1104 11:53:35.002068 1 kubelet.go:110] about to query kubelet using url: "http://172.27.8.214:10255/stats/kube-system/kube-dns-v9-oaep5/06e1de3d-82c2-11e5-a8b4-fa163e77e286/etcd"
I1104 11:53:35.002145 1 kubelet.go:110] about to query kubelet using url: "http://172.27.8.211:10255/stats/default/busybox/d22d309b-82c4-11e5-a8b4-fa163e77e286/busybox"
I1104 11:53:35.002244 1 kubelet.go:110] about to query kubelet using url: "http://172.27.8.214:10255/stats/default/redis-master-rze58/92a91592-8138-11e5-a8b4-fa163e77e286/master"
I1104 11:53:35.002346 1 kubelet.go:110] about to query kubelet using url: "http://172.27.8.214:10255/stats/kube-system/heapster-zcd76/3a99af72-82ea-11e5-a8b4-fa163e77e286/heapster"
I1104 11:53:35.003034 1 kubelet.go:96] failed to get stats from kubelet url: http://172.27.8.214:10255/stats/kube-system/influxdb-grafana-h9rnx/7801146f-81d8-11e5-a8b4-fa163e77e286/influxdb - Get http://172.27.8.214:10255/stats/kube-system/influxdb-grafana-h9rnx/7801146f-81d8-11e5-a8b4-fa163e77e286/influxdb: dial tcp 172.27.8.214:10255: connection refused
I1104 11:53:35.003081 1 kube_pods.go:110] failed to get stats for container "influxdb" in pod "kube-system"/"influxdb-grafana-h9rnx"
I1104 11:53:35.003091 1 kube_nodes.go:59] Failed to get container stats from Kubelet on node "172.27.8.214"
I1104 11:53:35.003139 1 kubelet.go:110] about to query kubelet using url: "http://172.27.8.214:10255/stats/kube-system/influxdb-grafana-h9rnx/7801146f-81d8-11e5-a8b4-fa163e77e286/grafana"
I1104 11:53:35.003165 1 kubelet.go:96] failed to get stats from kubelet url: http://172.27.8.214:10255/stats/kube-system/kube-dns-v9-oaep5/06e1de3d-82c2-11e5-a8b4-fa163e77e286/etcd - Get http://172.27.8.214:10255/stats/kube-system/kube-dns-v9-oaep5/06e1de3d-82c2-11e5-a8b4-fa163e77e286/etcd: dial tcp 172.27.8.214:10255: connection refused
I1104 11:53:35.003200 1 kube_pods.go:110] failed to get stats for container "etcd" in pod "kube-system"/"kube-dns-v9-oaep5"
I1104 11:53:35.003221 1 kubelet.go:110] about to query kubelet using url: "http://172.27.8.214:10255/stats/kube-system/kube-dns-v9-oaep5/06e1de3d-82c2-11e5-a8b4-fa163e77e286/kube2sky"
I1104 11:53:35.003350 1 kubelet.go:96] failed to get stats from kubelet url: http://172.27.8.214:10255/stats/kube-system/influxdb-grafana-h9rnx/7801146f-81d8-11e5-a8b4-fa163e77e286/grafana - Get http://172.27.8.214:10255/stats/kube-system/influxdb-grafana-h9rnx/7801146f-81d8-11e5-a8b4-fa163e77e286/grafana: dial tcp 172.27.8.214:10255: connection refused
I1104 11:53:35.003416 1 kube_pods.go:110] failed to get stats for container "grafana" in pod "kube-system"/"influxdb-grafana-h9rnx"
I1104 11:53:35.003476 1 kubelet.go:96] failed to get stats from kubelet url: http://172.27.8.214:10255/stats/kube-system/kube-dns-v9-oaep5/06e1de3d-82c2-11e5-a8b4-fa163e77e286/kube2sky - Get http://172.27.8.214:10255/stats/kube-system/kube-dns-v9-oaep5/06e1de3d-82c2-11e5-a8b4-fa163e77e286/kube2sky: dial tcp 172.27.8.214:10255: connection refused
I1104 11:53:35.003515 1 kube_pods.go:110] failed to get stats for container "kube2sky" in pod "kube-system"/"kube-dns-v9-oaep5"
I1104 11:53:35.003532 1 kubelet.go:110] about to query kubelet using url: "http://172.27.8.214:10255/stats/kube-system/kube-dns-v9-oaep5/06e1de3d-82c2-11e5-a8b4-fa163e77e286/skydns"
I1104 11:53:35.003700 1 kubelet.go:96] failed to get stats from kubelet url: http://172.27.8.214:10255/stats/default/redis-master-rze58/92a91592-8138-11e5-a8b4-fa163e77e286/master - Get http://172.27.8.214:10255/stats/default/redis-master-rze58/92a91592-8138-11e5-a8b4-fa163e77e286/master: dial tcp 172.27.8.214:10255: connection refused
I1104 11:53:35.003767 1 kube_pods.go:110] failed to get stats for container "master" in pod "default"/"redis-master-rze58"
I1104 11:53:35.003774 1 kubelet.go:96] failed to get stats from kubelet url: http://172.27.8.214:10255/stats/kube-system/kube-dns-v9-oaep5/06e1de3d-82c2-11e5-a8b4-fa163e77e286/skydns - Get http://172.27.8.214:10255/stats/kube-system/kube-dns-v9-oaep5/06e1de3d-82c2-11e5-a8b4-fa163e77e286/skydns: dial tcp 172.27.8.214:10255: connection refused
I1104 11:53:35.003844 1 kube_pods.go:110] failed to get stats for container "skydns" in pod "kube-system"/"kube-dns-v9-oaep5"
I1104 11:53:35.003871 1 kubelet.go:110] about to query kubelet using url: "http://172.27.8.214:10255/stats/kube-system/kube-dns-v9-oaep5/06e1de3d-82c2-11e5-a8b4-fa163e77e286/healthz"
I1104 11:53:35.003986 1 kubelet.go:96] failed to get stats from kubelet url: http://172.27.8.214:10255/stats/kube-system/heapster-zcd76/3a99af72-82ea-11e5-a8b4-fa163e77e286/heapster - Get http://172.27.8.214:10255/stats/kube-system/heapster-zcd76/3a99af72-82ea-11e5-a8b4-fa163e77e286/heapster: dial tcp 172.27.8.214:10255: connection refused
I1104 11:53:35.004023 1 kube_pods.go:110] failed to get stats for container "heapster" in pod "kube-system"/"heapster-zcd76"
I1104 11:53:35.004200 1 kubelet.go:96] failed to get stats from kubelet url: http://172.27.8.214:10255/stats/kube-system/kube-dns-v9-oaep5/06e1de3d-82c2-11e5-a8b4-fa163e77e286/healthz - Get http://172.27.8.214:10255/stats/kube-system/kube-dns-v9-oaep5/06e1de3d-82c2-11e5-a8b4-fa163e77e286/healthz: dial tcp 172.27.8.214:10255: connection refused
I1104 11:53:35.004250 1 kube_pods.go:110] failed to get stats for container "healthz" in pod "kube-system"/"kube-dns-v9-oaep5"
I1104 11:53:35.007198 1 kube_nodes.go:59] Failed to get container stats from Kubelet on node "172.27.8.212"
I1104 11:53:35.010137 1 kubelet.go:99] url: "http://172.27.8.211:10255/stats/default/busybox/d22d309b-82c4-11e5-a8b4-fa163e77e286/busybox", body: "{\"num_stats\":60,\"start\":\"2015-11-04T11:53:30Z\",\"end\":\"2015-11-04T11:53:35Z\"}", data: {ContainerReference:{Name:/system.slice/docker-e1a52f29c1bd116ecf599527f6c0d9e08b625b6c4edc8f69d3d1568d5299e382.scope Aliases:[k8s_busybox.d1c8ce40_busybox_default_d22d309b-82c4-11e5-a8b4-fa163e77e286_13b42095 e1a52f29c1bd116ecf599527f6c0d9e08b625b6c4edc8f69d3d1568d5299e382] Namespace:docker} Subcontainers:[] Spec:{CreationTime:2015-11-04 11:23:30.664343161 +0000 UTC Labels:map[io.kubernetes.pod.name:default/busybox] HasCpu:true Cpu:{Limit:2 MaxLimit:0 Mask:0-7} HasMemory:true Memory:{Limit:18446744073709551615 Reservation:0 SwapLimit:18446744073709551615} HasNetwork:true HasFilesystem:false HasDiskIo:true HasCustomMetrics:false CustomMetrics:[]} Stats:[]}
and i try run heapster with the same config on another kubernetes , all the nodes are ready , has containers, the grafana can view whole containers and nodes.
I've also seen this from time to time where heapster isn't able to get metrics from a node (that is not ready, for example), and is just stuck there. A connection timeout is needed when heapster connects to kubelet.
cc @piosz @mwielgus
Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Freeze the issue for 90d with /lifecycle frozen
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta
.
/lifecycle stale
this is quite old. please re-open if you are still experiencing the issue
deployed environment
there are 2 ready nodes ,the first is 172.27.8.211, the second is 172.27.8.214. All the pods of heapster are runing on the second node.
containers in the second node (172.27.8.214):
Grafana dashboards
The
Containers
dashboard only show containers of the first node, and is residual:also, The
Kubernetes Cluster
dashboard only show the first node (172.27.8.211):By
show series
in Influxdb, I can find containers data of the second node (172.27.8.214), why only view the first node in Grafana? I can find some residual data in underly results also!