kubernetes-retired / heapster

[EOL] Compute Resource Usage Analysis and Monitoring of Container Clusters
Apache License 2.0
2.63k stars 1.25k forks source link

heapster-nanny failed to start results in dashboard failed to start #2037

Closed JuneZhao closed 6 years ago

JuneZhao commented 6 years ago

Description

Steps to reproduce:

Describe the results you received: I need some help you from you guys regarding an error occurred on heapster-nanny container since it blocks dashboard to start:

log of heapster-nanny: E0428 10:00:02.985241 1 nanny_lib.go:95] Error while querying apiserver for resources: the server has asked for the client to provide credentials (get pods heapster-342135353-8m5xn) E0428 13:55:48.086484 1 nanny_lib.go:95] Error while querying apiserver for resources: the server has asked for the client to provide credentials (get pods heapster-342135353-8m5xn) E0428 16:34:46.386532 1 nanny_lib.go:95] Error while querying apiserver for resources: the server has asked for the client to provide credentials (get pods heapster-342135353-8m5xn) E0428 17:45:30.294490 1 nanny_lib.go:95] Error while querying apiserver for resources: the server has asked for the client to provide credentials (get pods heapster-342135353-8m5xn) E0428 22:34:42.582481 1 nanny_lib.go:95] Error while querying apiserver for resources: the server has asked for the client to provide credentials (get pods heapster-342135353-8m5xn) E0428 23:11:08.437273 1 reflector.go:315] k8s.io/contrib/addon-resizer/nanny/kubernetes_client.go:108: Failed to watch *v1.Node: the server has asked for the client to provide credentials (get nodes) E0428 23:32:03.772933 1 nanny_lib.go:95] Error while querying apiserver for resources: the server has asked for the client to provide credentials (get pods heapster-342135353-8m5xn)

log for container of heapster: I0420 03:29:52.957828 1 handlers.go:215] No metrics for pod default/vistamicroservices-3080061194-1p86h E0420 03:33:05.000330 1 summary.go:389] Node k8s-master-3688e46e-0 is not ready E0420 03:34:05.000259 1 summary.go:389] Node k8s-master-3688e46e-0 is not ready E0420 03:35:05.000557 1 summary.go:389] Node k8s-master-3688e46e-0 is not ready E0420 03:36:05.000303 1 summary.go:389] Node k8s-master-3688e46e-0 is not ready E0420 03:41:05.000354 1 summary.go:389] Node k8s-agent-3688e46e-4 is not ready E0420 03:41:05.000383 1 summary.go:389] Node k8s-agent-3688e46e-1 is not ready E0420 03:41:05.000400 1 summary.go:389] Node k8s-master-3688e46e-2 is not ready I0420 03:41:28.062581 1 handlers.go:264] No metrics for container jhipster-registry in pod default/jhipster-registry-610758079-7kf16 I0420 03:41:28.062608 1 handlers.go:215] No metrics for pod default/jhipster-registry-610758079-7kf16 I0420 03:41:29.382635 1 handlers.go:264] No metrics for container vistamicroservices in pod default/vistamicroservices-3080061194-1p86h I0420 03:41:29.382694 1 handlers.go:215] No metrics for pod default/vistamicroservices-3080061194-1p86h

It shows this error even after a reboot of the container:

kubectl exec heapster-342135353-8m5xn -n kube-system -c heapster-nanny reboot

apiserver pod shows below: 10:19 AM Victor Pachas: C:\Users\victor.pachas> kubectl logs kubernetes-dashboard-924040265-068vg -n kube-system Using HTTP port: 8443 Using in-cluster config to connect to apiserver Using service account token for csrf signing No request provided. Skipping authorization header Successful initial request to the apiserver, version: v1.7.7 No request provided. Skipping authorization header Creating in-cluster Heapster client Could not enable metric client: Health check failed: an error on the server ("unknown") has prevented the request frucceeding (get services heapster). Continuing.

It was deployed with azure container service-K8S on version 1.7.7.

Describe the results you expected:

Output of heapster --version:

Output of kubectl version: v1.7.7

(paste your output here)
JuneZhao commented 6 years ago

@kubernetes/sig-azure-bugs

k8s-ci-robot commented 6 years ago

@JuneZhao: Reiterating the mentions to trigger a notification: @kubernetes/sig-azure-bugs

In response to [this](https://github.com/kubernetes/heapster/issues/2037#issuecomment-386851343): >@kubernetes/sig-azure-bugs Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/devel/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
JuneZhao commented 6 years ago

@kubernetes/sig-scalability-bugs

k8s-ci-robot commented 6 years ago

@JuneZhao: Reiterating the mentions to trigger a notification: @kubernetes/sig-scalability-bugs

In response to [this](https://github.com/kubernetes/heapster/issues/2037#issuecomment-386851625): >@kubernetes/sig-scalability-bugs Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/devel/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
DirectXMan12 commented 6 years ago

Sounds like you haven't correctly set of the auth for the nanny. Does the service account have the right permissions, etc?

JuneZhao commented 6 years ago

@DirectXMan12 how can we check that, it is managed by azure, i dont know where to check

DirectXMan12 commented 6 years ago

kubectl get roles and kubectl get rolebindings. Please read up on cluster roles in the official Kubernetes documentation

fejta-bot commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 6 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 6 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 6 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/heapster/issues/2037#issuecomment-431487082): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.