helm / charts

⚠️(OBSOLETE) Curated applications for Kubernetes
Apache License 2.0
15.49k stars 16.8k forks source link

[stable/prometheus-adapter] 403 in readiness probe in AKS #10222

Closed cradle77 closed 5 years ago

cradle77 commented 5 years ago

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

BUG

Version of Helm and Kubernetes: Kubernetes 1.11.5 (in AKS, with RBAC enabled) Helm 2.10.0

Which chart: stable/prometheus-adapter

What happened: When installing the chart, the adapter pod's readiness and liveness probes fail with 403. This causes a CrashLoopback in the pod.

What you expected to happen: The Pod becoming ready and available

How to reproduce it (as minimally and precisely as possible): Create a cluster in AKS, with RBAC enabled install helm run

$ helm install --name my-release stable/prometheus-adapter

Anything else we need to know: This is what I see when describing the POD

$ bash-4.4# kubectl describe pod prometheus-adaptor-prometheus-adapter-5cddf7cc64-xphk6
Name:               prometheus-adaptor-prometheus-adapter-5cddf7cc64-xphk6
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               aks-nodepool1-17033719-0/10.240.0.4
Start Time:         Mon, 17 Dec 2018 09:40:04 +0000
Labels:             app=prometheus-adapter
                    chart=prometheus-adapter-v0.2.1
                    heritage=Tiller
                    pod-template-hash=1788937720
                    release=prometheus-adaptor
Annotations:        checksum/config: ee31aefa97b81c3c6f2332640107d252093b9c0282d87cefedf445bbd9d80a37
Status:             Running
IP:                 10.240.0.21
Controlled By:      ReplicaSet/prometheus-adaptor-prometheus-adapter-5cddf7cc64
Containers:
  prometheus-adapter:
    Container ID:  docker://76267031daeb5ef6457febb3c75d6ab952de98c3eaa447db2d7bdc6f55527286
    Image:         directxman12/k8s-prometheus-adapter-amd64:v0.4.0
    Image ID:      docker-pullable://directxman12/k8s-prometheus-adapter-amd64@sha256:a413730093da3a7a7048240ca88f68595cfd141eb36d9be400b10f0081df0e3d
    Port:          6443/TCP
    Host Port:     0/TCP
    Args:
      /adapter
      --secure-port=6443
      --cert-dir=/tmp/cert
      --logtostderr=true
      --prometheus-url=http://prometheus-prometheus-0.default.svc.cluster.local:9090
      --metrics-relist-interval=30s
      --v=6
      --config=/etc/adapter/config.yaml
    State:          Running
      Started:      Mon, 17 Dec 2018 09:42:01 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Mon, 17 Dec 2018 09:41:01 +0000
      Finished:     Mon, 17 Dec 2018 09:42:00 +0000
    Ready:          False
    Restart Count:  2
    Liveness:       http-get https://:https/healthz delay=30s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get https://:https/healthz delay=30s timeout=1s period=10s #success=1 #failure=3
    Environment:
      KUBERNETES_PORT_443_TCP_ADDR:  desakswe4-desakswe4-cbbd3e-6c3dc1d9.hcp.westeurope.azmk8s.io
      KUBERNETES_PORT:               tcp://desakswe4-desakswe4-cbbd3e-6c3dc1d9.hcp.westeurope.azmk8s.io:443
      KUBERNETES_PORT_443_TCP:       tcp://desakswe4-desakswe4-cbbd3e-6c3dc1d9.hcp.westeurope.azmk8s.io:443
      KUBERNETES_SERVICE_HOST:       desakswe4-desakswe4-cbbd3e-6c3dc1d9.hcp.westeurope.azmk8s.io
    Mounts:
      /etc/adapter/ from config (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-adaptor-prometheus-adapter-token-2d69w (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-adaptor-prometheus-adapter
    Optional:  false
  prometheus-adaptor-prometheus-adapter-token-2d69w:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-adaptor-prometheus-adapter-token-2d69w
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                  From                               Message
  ----     ------     ----                 ----                               -------
  Normal   Scheduled  2m39s                default-scheduler                  Successfully assigned default/prometheus-adaptor-prometheus-adapter-5cddf7cc64-xphk6 to aks-nodepool1-17033719-0
  Normal   Pulled     43s (x3 over 2m38s)  kubelet, aks-nodepool1-17033719-0  Container image "directxman12/k8s-prometheus-adapter-amd64:v0.4.0" already present on machine
  Normal   Killing    43s (x2 over 103s)   kubelet, aks-nodepool1-17033719-0  Killing container with id docker://prometheus-adapter:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Created    42s (x3 over 2m38s)  kubelet, aks-nodepool1-17033719-0  Created container
  Normal   Started    42s (x3 over 2m37s)  kubelet, aks-nodepool1-17033719-0  Started container
  Warning  Unhealthy  4s (x7 over 2m4s)    kubelet, aks-nodepool1-17033719-0  Readiness probe failed: HTTP probe failed with statuscode: 403
  Warning  Unhealthy  3s (x7 over 2m3s)    kubelet, aks-nodepool1-17033719-0  Liveness probe failed: HTTP probe failed with statuscode: 403
steven-sheehy commented 5 years ago

What chart version and can you provide the logs?

cradle77 commented 5 years ago

Hello,

Thanks a lot for getting back to me. I'm using the latest stable:

version: v0.2.3
appVersion: v0.4.1

I've also tried 0.2.1/0.4.0 - no changes.

In the pod logs I see just

I1224 12:10:06.552546       1 authorization.go:73] Forbidden: "/healthz", Reason: ""
I1224 12:10:06.553374       1 wrap.go:42] GET /healthz: (12.814884ms) 403 [[kube-probe/1.11] 10.240.0.4:37092]
I1224 12:10:08.774693       1 authorization.go:73] Forbidden: "/healthz", Reason: ""
I1224 12:10:08.774962       1 wrap.go:42] GET /healthz: (365.199µs) 403 [[kube-probe/1.11] 10.240.0.4:37116]

Node logs don't say much more than that:

I1224 12:11:08.779285    3160 prober.go:111] Liveness probe for "torrid-mite-prometheus-adapter-7555cf57fd-8wtzr_default(580bc002-0774-11e9-ac59-8ef8c1dc0bcc):prometheus-adapter" failed (failure): HTTP probe failed with statuscode: 403
I1224 12:11:16.540541    3160 prober.go:111] Readiness probe for "torrid-mite-prometheus-adapter-7555cf57fd-8wtzr_default(580bc002-0774-11e9-ac59-8ef8c1dc0bcc):prometheus-adapter" failed (failure): HTTP probe failed with statuscode: 403

Are there any other logs that might be helpful in diagnosing this?

Thanks! m.

steven-sheehy commented 5 years ago

I don't think you provided enough of the logs. Most likely the health check fails if it can't connect to Prometheus. Verify your Prometheus url and port are correct. Verify your Prometheus instance doesn't require some form of authentication. Try increasing the logLevel to a higher value.

mtparet commented 5 years ago

Same issue, logs:

I1228 11:12:50.857652       1 round_trippers.go:383] POST https://XXXXX.hcp.westeurope.azmk8s.io:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews
I1228 11:12:50.857658       1 round_trippers.go:390] Request Headers:
I1228 11:12:50.857663       1 round_trippers.go:393]     Content-Type: application/json
I1228 11:12:50.857666       1 round_trippers.go:393]     User-Agent: adapter/v0.0.0 (linux/amd64) kubernetes/$Format
I1228 11:12:50.857670       1 round_trippers.go:393]     Authorization: Bearer XXXXXX
I1228 11:12:50.857675       1 round_trippers.go:393]     Accept: application/json, */*
I1228 11:12:50.909221       1 round_trippers.go:408] Response Status: 201 Created in 51 milliseconds
I1228 11:12:50.909239       1 round_trippers.go:411] Response Headers:
I1228 11:12:50.909243       1 round_trippers.go:414]     Content-Type: application/json
I1228 11:12:50.909246       1 round_trippers.go:414]     Content-Length: 267
I1228 11:12:50.909249       1 round_trippers.go:414]     Date: Fri, 28 Dec 2018 11:12:50 GMT
I1228 11:12:50.909266       1 request.go:897] Response Body: {"kind":"SubjectAccessReview","apiVersion":"authorization.k8s.io/v1beta1","metadata":{"creationTimestamp":null},"spec":{"nonResourceAttributes":{"path":"/healthz","verb":"get"},"user":"system:anonymous","group":["system:unauthenticated"]},"status":{"allowed":false}}
I1228 11:12:50.909325       1 authorization.go:73] Forbidden: "/healthz", Reason: ""
I1228 11:12:50.909624       1 wrap.go:42] GET /healthz: (52.281729ms) 403 [[kube-probe/1.11] 10.200.20.126:58960]
I1228 11:12:52.917140       1 authorization.go:73] Forbidden: "/healthz", Reason: ""
I1228 11:12:52.917208       1 wrap.go:42] GET /healthz: (134.8µs) 403 [[kube-probe/1.11] 10.200.20.126:58984]
I1228 11:13:00.857800       1 authorization.go:73] Forbidden: "/healthz", Reason: ""
I1228 11:13:00.857910       1 wrap.go:42] GET /healthz: (181.899µs) 403 [[kube-probe/1.11] 10.200.20.126:59022]
steven-sheehy commented 5 years ago

What authorization-mode are you using for your kube-apiserver? Mine is --authorization-mode=Node,RBAC. I think having Node there is the magic which makes it possible for kubelet to GET /healthz successfully.

ading1977 commented 5 years ago

I am seeing the same issue on my AKS:

Events:
  Type     Reason     Age                  From                               Message
  ----     ------     ----                 ----                               -------
  Normal   Scheduled  3m13s                default-scheduler                  Successfully assigned default/prome-adapter-prometheus-adapter-559c98948d-n82vs to aks-agentpool-35064155-1
  Normal   Pulled     71s (x3 over 3m11s)  kubelet, aks-agentpool-35064155-1  Container image "directxman12/k8s-prometheus-adapter-amd64:v0.4.1" already present on machine
  Normal   Created    71s (x3 over 3m11s)  kubelet, aks-agentpool-35064155-1  Created container
  Normal   Killing    71s (x2 over 2m11s)  kubelet, aks-agentpool-35064155-1  Killing container with id docker://prometheus-adapter:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Started    70s (x3 over 3m11s)  kubelet, aks-agentpool-35064155-1  Started container
  Warning  Unhealthy  32s (x7 over 2m32s)  kubelet, aks-agentpool-35064155-1  Readiness probe failed: HTTP probe failed with statuscode: 403
  Warning  Unhealthy  32s (x7 over 2m32s)  kubelet, aks-agentpool-35064155-1  Liveness probe failed: HTTP probe failed with statuscode: 403
mtparet commented 5 years ago

It's not usuable but as it seems a common issue on aks, I cc some aks people which could help here. Feel free to come or not as your time is yours. @tariq1890 @jackfrancis @mboersma

grampelberg commented 5 years ago

It appears that in AKS the system:discovery role does not allow unauthenticated users (which is what a liveness probe is). You can get the adapter working by adding this:

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: healthz
rules:
- nonResourceURLs: ["/healthz", "/healthz/*"]
  verbs: ["get", "post"]

---

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: healthz
subjects:
- kind: Group
  name: system:unauthenticated
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: healthz
  apiGroup: rbac.authorization.k8s.io

Disclaimer: this is definitely too open of a policy and I'd recommend figuring out what the minimum required is.

cradle77 commented 5 years ago

Thanks @grampelberg, I'll give it a try and let you know!

grampelberg commented 5 years ago

@cradle77 after digging in a little bit more, this RBAC is actually part of the default. So, folks running into this aren't running the default bootstrap policy (likely for good reasons).

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] commented 5 years ago

This issue is being automatically closed due to inactivity.