linkerd / linkerd2

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.
https://linkerd.io
Apache License 2.0
10.48k stars 1.27k forks source link

Viz crashes Pods created by ingress controller #12344

Open qts0312 opened 3 months ago

qts0312 commented 3 months ago

What is the issue?

I am using Linkerd as service mesh and Contour as ingress controller in my cluster. After I install viz, Pods created by ingress controller turn into CrashLoopBackOff status.

How can it be reproduced?

As the configuration below, my ingress controller deploys several Pods to complete gateway function. However, after I install viz, I apply this configuration but Pods created by ingress controller crash while they work well before the installation of viz.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: hr-0
spec:
  parentRefs:
  - group: gateway.networking.k8s.io
    name: gw-0
    kind: Gateway
  hostnames:
  - '*.bookinfo.com'
  rules:
  - backendRefs:
    - name: ratings
      port: 9080
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: gw-0
spec:
  gatewayClassName: contour
  listeners:
  - name: port-0
    port: 9090
    protocol: HTTP
    hostname: '*.bookinfo.com'
    allowedRoutes:
      namespaces:
        from: All

Logs, error output, etc

Log of Pod created by ingress controller.

Before installing viz

[     0.001025s]  INFO ThreadId(01) linkerd2_proxy: release 2.210.4 (5a910be) by linkerd on 2023-11-22T17:01:44Z
[     0.001372s]  INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[     0.005117s]  INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
[     0.005124s]  INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[     0.005126s]  INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[     0.005127s]  INFO ThreadId(01) linkerd2_proxy: Tap DISABLED
[     0.005128s]  INFO ThreadId(01) linkerd2_proxy: Local identity is contour-gw-0.default.serviceaccount.identity.linkerd.cluster.local
[     0.005129s]  INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.005130s]  INFO ThreadId(01) linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.011168s]  INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity id=contour-gw-0.default.serviceaccount.identity.linkerd.cluster.local

After installing viz

[     0.001065s]  INFO ThreadId(01) linkerd2_proxy: release 2.210.4 (5a910be) by linkerd on 2023-11-22T17:01:44Z
[     0.001460s]  INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[     0.001996s]  INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
[     0.001999s]  INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[     0.002000s]  INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[     0.002001s]  INFO ThreadId(01) linkerd2_proxy: Tap interface on 0.0.0.0:4190
[     0.002003s]  INFO ThreadId(01) linkerd2_proxy: Local identity is contour-gw-0.default.serviceaccount.identity.linkerd.cluster.local
[     0.002004s]  INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.002005s]  INFO ThreadId(01) linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.007116s]  INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity id=contour-gw-0.default.serviceaccount.identity.linkerd.cluster.local
[     0.108773s]  INFO ThreadId(01) outbound:proxy{addr=10.96.0.1:443}:balance{addr=kubernetes.default.svc.cluster.local:443}: linkerd_proxy_api_resolve::resolve: No endpoints
[     3.108897s]  WARN ThreadId(01) outbound:proxy{addr=10.96.0.1:443}: linkerd_stack::failfast: Service entering failfast after 3s
[     3.109042s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service in fail-fast error.sources=[service in fail-fast] client.addr=10.244.2.48:58170 server.addr=10.96.0.1:443
[     4.111560s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:58180 server.addr=10.96.0.1:443
[     5.113136s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:58190 server.addr=10.96.0.1:443
[     6.115301s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:58202 server.addr=10.96.0.1:443
[     7.116603s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:58210 server.addr=10.96.0.1:443
[     8.117413s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:58214 server.addr=10.96.0.1:443
[     8.300124s]  INFO ThreadId(01) inbound:server{port=8000}:rescue{client.addr=10.244.2.1:44968}: linkerd_app_core::errors::respond: HTTP/1.1 request failed error=error trying to connect: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     9.119241s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:39414 server.addr=10.96.0.1:443
[    10.120805s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:39418 server.addr=10.96.0.1:443
[    11.122727s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:39434 server.addr=10.96.0.1:443
[    12.124844s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:39446 server.addr=10.96.0.1:443
[    13.126549s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:39448 server.addr=10.96.0.1:443
[    14.046202s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:39452 server.addr=10.96.0.1:443
[    15.046787s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:39464 server.addr=10.96.0.1:443
[    16.047976s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:39470 server.addr=10.96.0.1:443
[    17.049547s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:39478 server.addr=10.96.0.1:443
[    18.051019s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:39482 server.addr=10.96.0.1:443
[    18.302589s]  INFO ThreadId(01) inbound:server{port=8000}:rescue{client.addr=10.244.2.1:40004}: linkerd_app_core::errors::respond: HTTP/1.1 request failed error=error trying to connect: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[    19.052389s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:43430 server.addr=10.96.0.1:443
[    20.054239s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:43444 server.addr=10.96.0.1:443
[    21.055550s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:43450 server.addr=10.96.0.1:443
[    22.056633s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:43458 server.addr=10.96.0.1:443
[    23.058305s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:43464 server.addr=10.96.0.1:443
[    24.060545s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:43472 server.addr=10.96.0.1:443
[    41.843497s]  INFO ThreadId(01) outbound:proxy{addr=10.96.0.1:443}:balance{addr=kubernetes.default.svc.cluster.local:443}: linkerd_proxy_api_resolve::resolve: No endpoints
[    44.844665s]  WARN ThreadId(01) outbound:proxy{addr=10.96.0.1:443}: linkerd_stack::failfast: Service entering failfast after 3s
[    44.845305s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service in fail-fast error.sources=[service in fail-fast] client.addr=10.244.2.48:57216 server.addr=10.96.0.1:443
[    45.847942s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:57230 server.addr=10.96.0.1:443
[    46.850449s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:57244 server.addr=10.96.0.1:443
[    47.853921s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:57254 server.addr=10.96.0.1:443
[    48.300839s]  INFO ThreadId(01) inbound:server{port=8000}:rescue{client.addr=10.244.2.1:41782}: linkerd_app_core::errors::respond: HTTP/1.1 request failed error=error trying to connect: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[    48.440826s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:57264 server.addr=10.96.0.1:443
[    49.442945s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:58926 server.addr=10.96.0.1:443
[    50.445550s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:58940 server.addr=10.96.0.1:443
[    51.447956s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:58946 server.addr=10.96.0.1:443
[    52.450257s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:58962 server.addr=10.96.0.1:443
[    53.452091s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:58978 server.addr=10.96.0.1:443
[    54.453457s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:58990 server.addr=10.96.0.1:443
[    55.454800s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=logical service kubernetes.default.svc.cluster.local:443: service unavailable error.sources=[service unavailable] client.addr=10.244.2.48:58998 server.addr=10.96.0.1:443

output of linkerd check -o short

Status check results are √

Environment

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

kflynn commented 3 months ago

@qts0312 This is bizarre for sure. Can we see the full Pod manifests you're working with here? I can't really think of any reason that this should be happening.

qts0312 commented 2 months ago

Here are Pod manifests in this case. Are you referring to this?

$ kubectl get pods -n default
NAME                            READY   STATUS             RESTARTS      AGE
contour-gw-0-687fb5c7ff-jptq5   1/2     CrashLoopBackOff   4 (12s ago)   2m11s
contour-gw-0-687fb5c7ff-ntgj2   1/2     CrashLoopBackOff   4 (6s ago)    2m11s
envoy-gw-0-8lrmm                2/3     Running            0             2m11s
envoy-gw-0-l5r52                2/3     Running            0             2m11s
ratings-v1                      2/2     Running            0             2m12s
ratings-v2                      2/2     Running            0             2m12s

$ kubectl get pods -n projectcontour
NAME                                           READY   STATUS    RESTARTS   AGE
contour-gateway-provisioner-5fd5647b95-hz2v6   1/1     Running   0          5m59s
kflynn commented 2 months ago

@qts0312 Yes, can we get the YAML for the two Contour Pods?

qts0312 commented 2 months ago

These Pods are created by Contour to implement the gateway functionality automatically, so I don't have the exact YAML for them.

kflynn commented 2 months ago

You can get it with e.g. kubectl get pod contour-gw-0-687fb5c7ff-jptq5 -o yaml.

qts0312 commented 4 days ago

Really sorry for my late reply. Here is the YAML for Contour pods.

contour-pod.log