linkerd / linkerd2

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.
https://linkerd.io
Apache License 2.0
10.68k stars 1.28k forks source link

Linkerd-viz few pods are not up and running. #7946

Closed sathishkumar3009 closed 2 years ago

sathishkumar3009 commented 2 years ago

What is the issue?

LInkerd-viz few pods are not up and running.

Only two pods grafana and proetheus are running rest other pods are in crashloopbackoff.

NAME READY STATUS RESTARTS AGE grafana-59b7b797d6-2l7x7 1/1 Running 0 29m metrics-api-6c88c44cf9-kcls9 0/1 CrashLoopBackOff 11 29m prometheus-58d599f4cd-wjgsd 1/1 Running 0 29m tap-554cdfcf5c-zvzlx 0/1 Running 12 29m tap-injector-67f677c549-mc4wb 0/1 CrashLoopBackOff 11 29m web-75d9b8f9cd-xtxhr 0/1 CrashLoopBackOff 11 29m

How can it be reproduced?

Install linked-viz.

Logs, error output, etc

kubectl logs tap-injector-67f677c549-mc4wb -n linkerd-viz

Does not output any logs.

output of linkerd check -o short

Linkerd command does not work.

Environment

Above 1.20 is kubernetes version.

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

No response

mateiidavid commented 2 years ago

@sathishkumar3009 we cannot easily reproduce this problem by simply installing the extension, since this is more likely an isolated case that has to do with your set-up and environment. It would be very helpful for us in this case to see the state of your pods: any events that are affecting the rollout? any logs? and so on.

I'd start by doing kubectl describe on some of the pods that are failing. Is the proxy for any of the affected pods in a healthy state?

sathishkumar3009 commented 2 years ago

Hi david,

Describe of crashloop pods has same details for all, attached the same.

Describe of metrics API.

Events: Type Reason Age From Message


Normal Pulled 44m (x80 over 5h18m) kubelet Container image "cr.l5d.io/linkerd/metrics-api:stable-2.10.1" already present on machine Warning Unhealthy 14m (x706 over 5h19m) kubelet Readiness probe failed: Get "http://192.168.2.126:9995/ready": dial tcp 192.168.2.126:9995: connect: connection refused Warning BackOff 4m33s (x1054 over 5h12m) kubelet Back-off restarting failed container

Describe of tap Injector:

Name: tap-injector-67f677c549-mc4wb Namespace: linkerd-viz Priority: 0 Node: aks-workloadpool-21483928-vmss00000z/192.168.2.123 Start Time: Wed, 23 Feb 2022 07:52:16 +0000 Labels: component=tap-injector linkerd.io/extension=viz pod-template-hash=67f677c549 Annotations: checksum/config: 8281977ec47ca6168fa76f797a12bf0b91a36e705884f6e079907f9def5e9a6d linkerd.io/created-by: linkerd/helm stable-2.10.1 Status: Running IP: 192.168.2.129 IPs: IP: 192.168.2.129 Controlled By: ReplicaSet/tap-injector-67f677c549 Containers: tap-injector: Container ID: containerd://6a5bb481ab5c1a0b5e525a80ded3c59c27ce14c883b18c1011578981d3dd2ed7 Image: cr.l5d.io/linkerd/tap:stable-2.10.1 Image ID: cr.l5d.io/linkerd/tap@sha256:0836975b0bbbe6ab68db58332bff216a9a9b072941764029197290586a8d9e48 Ports: 8443/TCP, 9995/TCP Host Ports: 0/TCP, 0/TCP Args: injector -tap-service-name=tap.linkerd-viz.serviceaccount.identity.$(_l5d_ns).$(_l5d_trustdomain) -log-level=info State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 137 Started: Wed, 23 Feb 2022 13:07:46 +0000 Finished: Wed, 23 Feb 2022 13:08:46 +0000 Ready: False Restart Count: 91 Liveness: http-get http://:9995/ping delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get http://:9995/ready delay=0s timeout=1s period=10s #success=1 #failure=7 Environment: Mounts: /var/run/linkerd/tls from tls (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-k9l4z (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: tls: Type: Secret (a volume populated by a Secret) SecretName: tap-injector-k8s-tls Optional: false kube-api-access-k9l4z: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: beta.kubernetes.io/os=linux Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Warning Unhealthy 6m7s (x722 over 5h21m) kubelet Readiness probe failed: Get "http://192.168.2.129:9995/ready": dial tcp 192.168.2.129:9995: connect: connection refused Warning BackOff 64s (x1070 over 5h15m) kubelet Back-off restarting failed container

adleong commented 2 years ago

Hi @sathishkumar3009, is this related to https://github.com/linkerd/linkerd2/discussions/7988? Do the suggestions from that discussion fix this issue?

lcostea commented 2 years ago

I can confirm that if you install with Helm linkerd-viz 2.10.1 or 2.10.2 it will not start. I also think 2.11 will not work, but I do see it was solved in the main branch now. In @sathishkumar3009 installation (as well as in mine) you can see that there is no proxy injected (metrics-api is 1/1). That's because in Helm installation you can't use the namespace manifest and you usually pass the --create-namespace and set installNamespace to false. So the inject annotation will not be added when the ns is created by Helm. And because the inject annotation is not present in 2.10 or 2.11 on the pod template in the deployments, there will be no proxy injected. I tried to add it via podAnnotations, but that is being handled differently in prometheus and metrics-api and it will give an error. After I modified the ns manually to add the annotation and then installed it via Helm the linkerd-viz extension started working.

adleong commented 2 years ago

Since this is fixed on main we'll close this issue. Thank you for looking into it.