linkerd / linkerd2

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.
https://linkerd.io
Apache License 2.0
10.6k stars 1.27k forks source link

Installing Linkerd on EKS (ec2 nodes) #11697

Closed Nutties93 closed 1 month ago

Nutties93 commented 9 months ago

What is the issue?

I am facing issues with linkerd-destination and linkerd-proxy-injector. Pods that have linkerd-inject enabled are stuck in podInitializing state. I tried to restart the deployment of the destination and proxy-injector, sometimes the pods managed to be deployed. But after that, subsequent pods will be stuck at podInitializing state. My nodes have sufficient CPU and memory space.

When I run linkerd check, there are no errors as well.

How can it be reproduced?

install linked via cli on EKS.

Logs, error output, etc

Linkerd-destination logs: kubectl -n linkerd logs deploy/linkerd-destination Defaulted container "linkerd-proxy" out of: linkerd-proxy, destination, sp-validator, policy, linkerd-init (init) [ 0.003199s] INFO ThreadId(01) linkerd2_proxy: release 2.210.4 (5a910be) by linkerd on 2023-11-22T17:01:46Z [ 0.004399s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime [ 0.005481s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191 [ 0.005534s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143 [ 0.005539s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140 [ 0.005543s] INFO ThreadId(01) linkerd2_proxy: Tap interface on 0.0.0.0:4190 [ 0.005547s] INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local [ 0.005552s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local) [ 0.005556s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086 [ 0.007430s] WARN ThreadId(01) watch{port=4191}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)] [ 0.036092s] INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity id=linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local [ 0.112753s] WARN ThreadId(01) watch{port=4191}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)] [ 0.326477s] WARN ThreadId(01) watch{port=4191}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)] [ 0.741192s] WARN ThreadId(01) watch{port=4191}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)] [ 1.242244s] WARN ThreadId(01) watch{port=4191}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)] [ 1.743246s] WARN ThreadId(01) watch{port=4191}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]

Linkerd-proxy-injector logs: Found 2 pods, using pod/linkerd-proxy-injector-757d76768f-h9bw2 Defaulted container "linkerd-proxy" out of: linkerd-proxy, proxy-injector, linkerd-init (init) [ 0.002738s] INFO ThreadId(01) linkerd2_proxy: release 2.210.4 (5a910be) by linkerd on 2023-11-22T17:01:46Z [ 0.003994s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime [ 0.004994s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191 [ 0.005012s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143 [ 0.005016s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140 [ 0.005020s] INFO ThreadId(01) linkerd2_proxy: Tap interface on 0.0.0.0:4190 [ 0.005023s] INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-proxy-injector.linkerd.serviceaccount.identity.linkerd.cluster.local [ 0.005026s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local) [ 0.005029s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local) [ 0.022461s] INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity id=linkerd-proxy-injector.linkerd.serviceaccount.identity.linkerd.cluster.local [ 14.050531s] WARN ThreadId(01) watch{port=4191}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=The operation completed successfully grpc.message="stream ended" [ 14.050577s] WARN ThreadId(01) watch{port=8443}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=The operation completed successfully grpc.message="stream ended" [ 14.050593s] WARN ThreadId(01) watch{port=9995}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=The operation completed successfully grpc.message="stream ended" [ 14.050622s] WARN ThreadId(01) watch{port=4191}:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.25.1.130:8090}: linkerd_reconnect: Service failed error=endpoint 172.25.1.130:8090: channel closed error.sources=[channel closed] [ 14.158073s] WARN ThreadId(01) watch{port=4191}:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.25.1.130:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.25.1.130:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)] [ 14.363576s] WARN ThreadId(01) watch{port=4191}:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.25.1.130:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.25.1.130:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)] [ 14.784335s] WARN ThreadId(01) watch{port=4191}:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.25.1.130:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.25.1.130:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)] [ 749.680863s] WARN ThreadId(01) watch{port=8443}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=The operation completed successfully grpc.message="stream ended" [ 749.680916s] WARN ThreadId(01) watch{port=4191}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=The operation completed successfully grpc.message="stream ended" [ 749.680934s] WARN ThreadId(01) watch{port=9995}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=The operation completed successfully grpc.message="stream ended" [ 750.683751s] WARN ThreadId(01) watch{port=8443}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=The service is currently unavailable grpc.message="client 172.25.1.64:34186: server: 172.25.1.82:8090: server 172.25.1.82:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: service in fail-fast" [ 750.683879s] WARN ThreadId(01) watch{port=4191}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=The service is currently unavailable grpc.message="client 172.25.1.64:34186: server: 172.25.1.82:8090: server 172.25.1.82:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: service in fail-fast" [ 750.683981s] WARN ThreadId(01) watch{port=9995}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=The service is currently unavailable grpc.message="client 172.25.1.64:34186: server: 172.25.1.82:8090: server 172.25.1.82:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: service in fail-fast" [ 750.791438s] WARN ThreadId(01) watch{port=4191}:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.25.1.82:8090}: linkerd_reconnect: Service failed error=endpoint 172.25.1.82:8090: channel closed error.sources=[channel closed] [ 751.905842s] WARN ThreadId(01) watch{port=4191}:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.25.1.82:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.25.1.82:8090: connect timed out after 1s error.sources=[connect timed out after 1s] [ 753.115467s] WARN ThreadId(01) watch{port=4191}:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.25.1.82:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.25.1.82:8090: connect timed out after 1s error.sources=[connect timed out after 1s] [ 754.546487s] WARN ThreadId(01) watch{port=4191}:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.25.1.82:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.25.1.82:8090: connect timed out after 1s error.sources=[connect timed out after 1s]

Control-plane metrics logs: #

POD linkerd-proxy-injector-77b4777668-k5z8b (13 of 13)

ERROR Get "http://localhost:46023/metrics": EOF

output of linkerd check -o short

linkerd-version

‼ can determine the latest version Get "https://versioncheck.linkerd.io/version.json?version=stable-2.14.5&uuid=89015292-fa49-4de5-90b4-280126337b83&source=cli": net/http: TLS handshake timeout see https://linkerd.io/2.14/checks/#l5d-version-latest for hints ‼ cli is up-to-date unsupported version channel: stable-2.14.5 see https://linkerd.io/2.14/checks/#l5d-version-cli for hints

control-plane-version

‼ control plane is up-to-date unsupported version channel: stable-2.14.5 see https://linkerd.io/2.14/checks/#l5d-version-control for hints

linkerd-control-plane-proxy

\ pod "linkerd-destination-765b97b794-zwn27" status is Running

Environment

EKS 1.27 Client version: stable-2.14.5 Server version: stable-2.14.5

Installation via CLI commands.

Possible solution

My suspicion is that maybe the service is not assigned local clusterIP address. image When I run k describe svc linkerd-policy -n linkerd, Name: linkerd-policy Namespace: linkerd Labels: linkerd.io/control-plane-component=destination linkerd.io/control-plane-ns=linkerd Annotations: linkerd.io/created-by: linkerd/cli stable-2.14.5 Selector: linkerd.io/control-plane-component=destination Type: ClusterIP IP Family Policy: SingleStack IP Families: IPv4 IP: None IPs: None Port: grpc 8090/TCP TargetPort: 8090/TCP Endpoints: 172.25.1.195:8090 Session Affinity: None Events:

Somehow the linkerd-policy endpoints are not using clusterIP but rather the CIDR of the subnet.

Additional context

Currently, i have to keep restarting the deploy of destination and proxy-injector, somehow after X tries , my pods can run. Has anybody successfully installed linkerd on EKS?

Would you like to work on fixing this bug?

no

kflynn commented 8 months ago

@Nutties93 Happy new year!

So, first, linkerd-policy is a headless service, so it's OK that it doesn't have a ClusterIP -- that's to be expected.

Beyond that, though, could we get a kubectl describe from the linkerd-destination pod, and the logs from its policy container?

DPOD=$(kubectl get pods -n linkerd -l 'linkerd.io/control-plane-component=destination' -o jsonpath='{ .items[0].metadata.name }')
kubectl describe -n linkerd pod $DPOD
kubectl logs -n linkerd $DPOD -c policy

Thanks!! 🙂

piotrrojek commented 7 months ago

Hey, I'm experiencing the same issue with k3s and linkerd, installed on 3 ec2 nodes (1 master, 2 workers).

Here's the output of commands you asked for @kflynn

DPOD=$(kubectl get pods -n linkerd -l 'linkerd.io/control-plane-component=destination' -o jsonpath='{ .items[0].metadata.name }')
kubectl describe -n linkerd pod $DPOD
kubectl logs -n linkerd $DPOD -c policy
Name:             linkerd-destination-56db447bcf-klhfn
Namespace:        linkerd
Priority:         0
Service Account:  linkerd-destination
Node:             ip-172-31-25-125/172.31.25.125
Start Time:       Thu, 22 Feb 2024 10:27:31 +0100
Labels:           linkerd.io/control-plane-component=destination
                  linkerd.io/control-plane-ns=linkerd
                  linkerd.io/proxy-deployment=linkerd-destination
                  linkerd.io/workload-ns=linkerd
                  pod-template-hash=56db447bcf
Annotations:      checksum/config: bd31c9c8aacd5b84e1e057813a312b61e4dfe2b66407a211e383b47cc0f7860b
                  cluster-autoscaler.kubernetes.io/safe-to-evict: true
                  config.linkerd.io/default-inbound-policy: all-unauthenticated
                  linkerd.io/created-by: linkerd/cli stable-2.14.10
                  linkerd.io/proxy-version: stable-2.14.10
                  linkerd.io/trust-root-sha256: f36265f164549710dc03ae3e4898aa20c119fac16c4a79e3ef34b838cb119851
Status:           Running
SeccompProfile:   RuntimeDefault
IP:               10.42.1.13
IPs:
  IP:           10.42.1.13
Controlled By:  ReplicaSet/linkerd-destination-56db447bcf
Init Containers:
  linkerd-init:
    Container ID:    containerd://4ad034606556e23333e9c695df744a1be54781cbff257f30db7aeb0813c441ca
    Image:           cr.l5d.io/linkerd/proxy-init:v2.2.3
    Image ID:        cr.l5d.io/linkerd/proxy-init@sha256:1075bc22a4a8f0852311dc84c9db0552f1245d07fe4fdebd4bc6cf4566bcbc76
    Port:            <none>
    Host Port:       <none>
    SeccompProfile:  RuntimeDefault
    Args:
      --incoming-proxy-port
      4143
      --outgoing-proxy-port
      4140
      --proxy-uid
      2102
      --inbound-ports-to-ignore
      4190,4191,4567,4568
      --outbound-ports-to-ignore
      443,6443
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 22 Feb 2024 10:27:32 +0100
      Finished:     Thu, 22 Feb 2024 10:27:32 +0100
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  20Mi
    Requests:
      cpu:        100m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /run from linkerd-proxy-init-xtables-lock (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j94nv (ro)
Containers:
  linkerd-proxy:
    Container ID:    containerd://3632756adce57c439565c9156401402c84d878bbeb311d92ca495ecf67919b3e
    Image:           cr.l5d.io/linkerd/proxy:stable-2.14.10
    Image ID:        cr.l5d.io/linkerd/proxy@sha256:7876cee0717575ebc39d2b7cfd701e0df28a809bcb2cf4974716a0bce1ce32cb
    Ports:           4143/TCP, 4191/TCP
    Host Ports:      0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    State:           Waiting
      Reason:        PostStartHookError
    Last State:      Terminated
      Reason:        Error
      Message:       nection refused (os error 111) error.sources=[Connection refused (os error 111)]
[   146.632250s]  WARN ThreadId(01) watch{port=9990}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[   147.133080s]  WARN ThreadId(01) watch{port=9990}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[   147.634038s]  WARN ThreadId(01) watch{port=9990}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[   148.134891s]  WARN ThreadId(01) watch{port=9990}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[   148.636371s]  WARN ThreadId(01) watch{port=9990}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[   149.138241s]  WARN ThreadId(01) watch{port=9990}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[   149.639188s]  WARN ThreadId(01) watch{port=9990}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[   150.017773s]  WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...

      Exit Code:    137
      Started:      Thu, 22 Feb 2024 10:27:33 +0100
      Finished:     Thu, 22 Feb 2024 10:30:03 +0100
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:4191/live delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:4191/ready delay=2s timeout=1s period=10s #success=1 #failure=3
    Environment:
      _pod_name:                                                linkerd-destination-56db447bcf-klhfn (v1:metadata.name)
      _pod_ns:                                                  linkerd (v1:metadata.namespace)
      _pod_nodeName:                                             (v1:spec.nodeName)
      LINKERD2_PROXY_LOG:                                       warn,linkerd=info,trust_dns=error
      LINKERD2_PROXY_LOG_FORMAT:                                plain
      LINKERD2_PROXY_DESTINATION_SVC_ADDR:                      localhost.:8086
      LINKERD2_PROXY_DESTINATION_PROFILE_NETWORKS:              10.0.0.0/8,100.64.0.0/10,172.16.0.0/12,192.168.0.0/16
      LINKERD2_PROXY_POLICY_SVC_ADDR:                           localhost.:8090
      LINKERD2_PROXY_POLICY_WORKLOAD:                           $(_pod_ns):$(_pod_name)
      LINKERD2_PROXY_INBOUND_DEFAULT_POLICY:                    all-unauthenticated
      LINKERD2_PROXY_POLICY_CLUSTER_NETWORKS:                   10.0.0.0/8,100.64.0.0/10,172.16.0.0/12,192.168.0.0/16
      LINKERD2_PROXY_INBOUND_CONNECT_TIMEOUT:                   100ms
      LINKERD2_PROXY_OUTBOUND_CONNECT_TIMEOUT:                  1000ms
      LINKERD2_PROXY_OUTBOUND_DISCOVERY_IDLE_TIMEOUT:           5s
      LINKERD2_PROXY_INBOUND_DISCOVERY_IDLE_TIMEOUT:            90s
      LINKERD2_PROXY_CONTROL_LISTEN_ADDR:                       0.0.0.0:4190
      LINKERD2_PROXY_ADMIN_LISTEN_ADDR:                         0.0.0.0:4191
      LINKERD2_PROXY_OUTBOUND_LISTEN_ADDR:                      127.0.0.1:4140
      LINKERD2_PROXY_INBOUND_LISTEN_ADDR:                       0.0.0.0:4143
      LINKERD2_PROXY_INBOUND_IPS:                                (v1:status.podIPs)
      LINKERD2_PROXY_INBOUND_PORTS:                             8086,8090,8443,9443,9990,9996,9997
      LINKERD2_PROXY_DESTINATION_PROFILE_SUFFIXES:              svc.cluster.local.
      LINKERD2_PROXY_INBOUND_ACCEPT_KEEPALIVE:                  10000ms
      LINKERD2_PROXY_OUTBOUND_CONNECT_KEEPALIVE:                10000ms
      LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION:  25,587,3306,4444,5432,6379,9300,11211
      LINKERD2_PROXY_DESTINATION_CONTEXT:                       {"ns":"$(_pod_ns)", "nodeName":"$(_pod_nodeName)", "pod":"$(_pod_name)"}

      _pod_sa:                                                   (v1:spec.serviceAccountName)
      _l5d_ns:                                                  linkerd
      _l5d_trustdomain:                                         cluster.local
      LINKERD2_PROXY_IDENTITY_DIR:                              /var/run/linkerd/identity/end-entity
      LINKERD2_PROXY_IDENTITY_TRUST_ANCHORS:                    <set to the key 'ca-bundle.crt' of config map 'linkerd-identity-trust-roots'>  Optional: false
      LINKERD2_PROXY_IDENTITY_TOKEN_FILE:                       /var/run/secrets/tokens/linkerd-identity-token
      LINKERD2_PROXY_IDENTITY_SVC_ADDR:                         linkerd-identity-headless.linkerd.svc.cluster.local.:8080
      LINKERD2_PROXY_IDENTITY_LOCAL_NAME:                       $(_pod_sa).$(_pod_ns).serviceaccount.identity.linkerd.cluster.local
      LINKERD2_PROXY_IDENTITY_SVC_NAME:                         linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local
      LINKERD2_PROXY_DESTINATION_SVC_NAME:                      linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
      LINKERD2_PROXY_POLICY_SVC_NAME:                           linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
    Mounts:
      /var/run/linkerd/identity/end-entity from linkerd-identity-end-entity (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j94nv (ro)
      /var/run/secrets/tokens from linkerd-identity-token (rw)
  destination:
    Container ID:    containerd://ac08bbe37b8e439a9a50beb49e0ca0f6aa91ba4051e42758daf9147714f73704
    Image:           cr.l5d.io/linkerd/controller:stable-2.14.10
    Image ID:        cr.l5d.io/linkerd/controller@sha256:65bed6a346b259cb1ff04420ee296afa28c38cb3e789ce285e5987f039dddf45
    Ports:           8086/TCP, 9996/TCP
    Host Ports:      0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      destination
      -addr=:8086
      -controller-namespace=linkerd
      -enable-h2-upgrade=true
      -log-level=info
      -log-format=plain
      -enable-endpoint-slices=true
      -cluster-domain=cluster.local
      -identity-trust-domain=cluster.local
      -default-opaque-ports=25,587,3306,4444,5432,6379,9300,11211
      -enable-pprof=false
    State:          Running
      Started:      Thu, 22 Feb 2024 10:30:03 +0100
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:9996/ping delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:9996/ready delay=0s timeout=1s period=10s #success=1 #failure=7
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j94nv (ro)
  sp-validator:
    Container ID:    containerd://bf47e711ac359a520ae36274f3fa52fc72909d1b1328332a638566b246de9664
    Image:           cr.l5d.io/linkerd/controller:stable-2.14.10
    Image ID:        cr.l5d.io/linkerd/controller@sha256:65bed6a346b259cb1ff04420ee296afa28c38cb3e789ce285e5987f039dddf45
    Ports:           8443/TCP, 9997/TCP
    Host Ports:      0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      sp-validator
      -log-level=info
      -log-format=plain
      -enable-pprof=false
    State:          Running
      Started:      Thu, 22 Feb 2024 10:30:03 +0100
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:9997/ping delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:9997/ready delay=0s timeout=1s period=10s #success=1 #failure=7
    Environment:    <none>
    Mounts:
      /var/run/linkerd/tls from sp-tls (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j94nv (ro)
  policy:
    Container ID:    containerd://1300c79bd7d067b94ba96c44d9f64963701252b21391f8afbd9c51c082e06cec
    Image:           cr.l5d.io/linkerd/policy-controller:stable-2.14.10
    Image ID:        cr.l5d.io/linkerd/policy-controller@sha256:763ccb8651e3ba93507732205c18b8dc2d15da94b4f5d04e8683c3f92f0c5ebe
    Ports:           8090/TCP, 9990/TCP, 9443/TCP
    Host Ports:      0/TCP, 0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      --admin-addr=0.0.0.0:9990
      --control-plane-namespace=linkerd
      --grpc-addr=0.0.0.0:8090
      --server-addr=0.0.0.0:9443
      --server-tls-key=/var/run/linkerd/tls/tls.key
      --server-tls-certs=/var/run/linkerd/tls/tls.crt
      --cluster-networks=10.0.0.0/8,100.64.0.0/10,172.16.0.0/12,192.168.0.0/16
      --identity-domain=cluster.local
      --cluster-domain=cluster.local
      --default-policy=all-unauthenticated
      --log-level=info
      --log-format=plain
      --default-opaque-ports=25,587,3306,4444,5432,6379,9300,11211
      --probe-networks=0.0.0.0/0
    State:          Running
      Started:      Thu, 22 Feb 2024 10:30:03 +0100
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:admin-http/live delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:admin-http/ready delay=10s timeout=1s period=10s #success=1 #failure=7
    Environment:    <none>
    Mounts:
      /var/run/linkerd/tls from policy-tls (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j94nv (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  sp-tls:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  linkerd-sp-validator-k8s-tls
    Optional:    false
  policy-tls:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  linkerd-policy-validator-k8s-tls
    Optional:    false
  linkerd-proxy-init-xtables-lock:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  linkerd-identity-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
  linkerd-identity-end-entity:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  kube-api-access-j94nv:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason               Age                  From               Message
  ----     ------               ----                 ----               -------
  Normal   Scheduled            2m55s                default-scheduler  Successfully assigned linkerd/linkerd-destination-56db447bcf-klhfn to ip-172-31-25-125
  Normal   Pulled               2m56s                kubelet            Container image "cr.l5d.io/linkerd/proxy-init:v2.2.3" already present on machine
  Normal   Created              2m56s                kubelet            Created container linkerd-init
  Normal   Started              2m55s                kubelet            Started container linkerd-init
  Warning  FailedPostStartHook  54s                  kubelet            PostStartHook failed
  Normal   Killing              54s                  kubelet            FailedPostStartHook
  Normal   Pulled               24s                  kubelet            Container image "cr.l5d.io/linkerd/controller:stable-2.14.10" already present on machine
  Normal   Created              24s                  kubelet            Created container destination
  Normal   Started              24s                  kubelet            Started container destination
  Normal   Pulled               24s                  kubelet            Container image "cr.l5d.io/linkerd/controller:stable-2.14.10" already present on machine
  Normal   Created              24s                  kubelet            Created container sp-validator
  Normal   Started              24s                  kubelet            Started container sp-validator
  Normal   Pulled               24s                  kubelet            Container image "cr.l5d.io/linkerd/policy-controller:stable-2.14.10" already present on machine
  Normal   Created              24s                  kubelet            Created container policy
  Normal   Started              24s                  kubelet            Started container policy
  Warning  Unhealthy            23s                  kubelet            Readiness probe failed: Get "http://10.42.1.13:9996/ready": dial tcp 10.42.1.13:9996: connect: connection refused
  Warning  Unhealthy            23s                  kubelet            Readiness probe failed: Get "http://10.42.1.13:9997/ready": dial tcp 10.42.1.13:9997: connect: connection refused
  Normal   Pulled               23s (x2 over 2m54s)  kubelet            Container image "cr.l5d.io/linkerd/proxy:stable-2.14.10" already present on machine
  Normal   Created              23s (x2 over 2m54s)  kubelet            Created container linkerd-proxy
  Normal   Started              23s (x2 over 2m54s)  kubelet            Started container linkerd-proxy
  Warning  Unhealthy            15s                  kubelet            Liveness probe failed: Get "http://10.42.1.13:9990/live": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy            15s                  kubelet            Readiness probe failed: Get "http://10.42.1.13:9997/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy            15s                  kubelet            Readiness probe failed: Get "http://10.42.1.13:9996/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024-02-22T09:30:03.900982Z  INFO linkerd_policy_controller: created Lease resource lease=Lease { metadata: ObjectMeta { annotations: None, cluster_name: None, creation_timestamp: Some(Time(2024-02-22T09:30:03Z)), deletion_grace_period_seconds: None, deletion_timestamp: None, finalizers: None, generate_name: None, generation: None, labels: Some({"linkerd.io/control-plane-component": "destination", "linkerd.io/control-plane-ns": "linkerd"}), managed_fields: Some([ManagedFieldsEntry { api_version: Some("coordination.k8s.io/v1"), fields_type: Some("FieldsV1"), fields_v1: Some(FieldsV1(Object {"f:metadata": Object {"f:labels": Object {"f:linkerd.io/control-plane-component": Object {}, "f:linkerd.io/control-plane-ns": Object {}}, "f:ownerReferences": Object {"k:{\"uid\":\"64bbc220-5618-4aab-a4a8-f52d51946eca\"}": Object {}}}})), manager: Some("policy-controller"), operation: Some("Apply"), time: Some(Time(2024-02-22T09:30:03Z)) }]), name: Some("policy-controller-write"), namespace: Some("linkerd"), owner_references: Some([OwnerReference { api_version: "apps/v1", block_owner_deletion: None, controller: Some(true), kind: "Deployment", name: "linkerd-destination", uid: "64bbc220-5618-4aab-a4a8-f52d51946eca" }]), resource_version: Some("3964"), self_link: None, uid: Some("75c0f6e5-1005-4c4c-8447-1858882d5c66") }, spec: Some(LeaseSpec { acquire_time: None, holder_identity: None, lease_duration_seconds: None, lease_transitions: None, renew_time: None }) }
2024-02-22T09:30:03.907787Z  INFO grpc{port=8090}: linkerd_policy_controller: policy gRPC server listening addr=0.0.0.0:8090
faizan-planview commented 5 months ago

Looks like there is no update on this issue? We are facing same issue in our organization

piotrrojek commented 5 months ago

Please check if you have both TCP and UDP traffic allowed between nodes -- it was the problem in my case; I forgot about UDP and it was causing this issue.

yishaihl commented 3 months ago

Please check if you have both TCP and UDP traffic allowed between nodes -- it was the problem in my case; I forgot about UDP and it was causing this issue.

@piotrrojek we have the same problem..are you using EKS?

dverzolla commented 2 months ago

Please check if you have both TCP and UDP traffic allowed between nodes -- it was the problem in my case; I forgot about UDP and it was causing this issue.

I was facing slowness using eks + linkerd ha. For some reason when exposing k8s services, it was taking about 12 seconds to finish. Your comment enlightened me to do the same test and after creating TCP and UDP allow between nodes, k8s service creation returned back for few milliseconds.

My case is not related with the issue itself, but I am writing down here for someone else that is having issues with slowness while creating services.

kflynn commented 1 month ago

@dverzolla This is... wow. Would you consider a doc PR explaining this? or can you tell me what you did so that I can update the docs? 🙂

dverzolla commented 1 month ago

@dverzolla This is... wow. Would you consider a doc PR explaining this? or can you tell me what you did so that I can update the docs? 🙂

@kflynn Actually I've created a forum post: https://linkerd.buoyant.io/t/eks-service-creation-taking-too-long-solved/527

Sure I can create the doc PR.

kflynn commented 1 month ago

@dverzolla Great! Thanks on all counts! 🙂

I'm going to go ahead and close this issue, then – please tag me in the PR! 🙂