linkerd / linkerd2

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.
https://linkerd.io
Apache License 2.0
10.62k stars 1.28k forks source link

Getting "connection closed" error intermittently for go services #6753

Closed sumit-joshi-mt closed 2 years ago

sumit-joshi-mt commented 3 years ago

Bug Report

What is the issue?

So after upgrading to stable 2.10 we were facing this issue where, connections disconnects intermittently while connecting to unmeshed services. This issue got fixed after upgrading to edge-21.7.4. But now we are facing same issue of intermittent "connection closed"(using edge-21.7.4 proxy) with meshed services, connection to unmeshed services are working fine. In this case both client and server are in go. I have attached linkerd-proxy logs below. Thank you.

Logs, error output, etc


[ 68145.657467s]  WARN ThreadId(01) outbound:accept{client.addr=10.83.74.97:45234}:server{orig_dst=172.20.222.45:80}:profile:http{v=h2}: linkerd_app_core::errors: Failed to proxy request: operation was canceled: connection closed client.addr=10.83.74.97:45234
[ 68145.657476s] DEBUG ThreadId(01) outbound:accept{client.addr=10.83.74.97:45234}:server{orig_dst=172.20.222.45:80}:profile:http{v=h2}: linkerd_app_core::errors: Closing server-side connection
[ 68145.656282s] DEBUG ThreadId(01) outbound:accept{client.addr=10.83.74.97:45234}:server{orig_dst=172.20.222.45:80}:profile:http{v=h2}:logical{dst=umg-svc-p-es.staging.svc.cluster.local:80}:concrete{addr=umg-svc-p-es.staging.svc.cluster.local:80}:endpoint{server.addr=10.83.57.39:80}:h2:Connection{peer=Client}: h2::codec::framed_read: received frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(54021) }
[ 68145.659589s]  WARN ThreadId(01) outbound:accept{client.addr=10.83.74.97:60264}:server{orig_dst=172.20.222.45:80}:profile:http{v=h2}:logical{dst=umg-svc-p-es.staging.svc.cluster.local:80}:concrete{addr=umg-svc-p-es.staging.svc.cluster.local:80}:endpoint{server.addr=10.83.57.39:80}: linkerd_reconnect: Service failed error=channel closed
[ 68145.657488s] DEBUG ThreadId(01) outbound:accept{client.addr=10.83.74.97:45234}:server{orig_dst=172.20.222.45:80}:profile:http{v=h2}: linkerd_app_core::errors: Handling error with gRPC status code=Internal
[ 68145.657742s] DEBUG ThreadId(01) outbound:accept{client.addr=10.83.74.97:45234}:server{orig_dst=172.20.222.45:80}:profile:http{v=h2}:Connection{peer=Server}: h2::codec::framed_write: send frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(233333) }
[ 68145.657532s] DEBUG ThreadId(01) outbound:accept{client.addr=10.83.74.97:45234}:server{orig_dst=172.20.222.45:80}:profile:http{v=h2}:Connection{peer=Server}: h2::codec::framed_write: send frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(2147483647) }
[ 68145.738950s] DEBUG ThreadId(01) outbound:accept{client.addr=10.83.74.97:45234}:server{orig_dst=172.20.222.45:80}:profile:http{v=h2}:logical{dst=umg-svc-p-es.staging.svc.cluster.local:80}:concrete{addr=umg-svc-p-es.staging.svc.cluster.local:80}:endpoint{server.addr=10.83.57.39:80}:h2:Connection{peer=Client}: h2::proto::connection: Connection::poll; connection error error=NO_ERROR
[ 68145.738935s] DEBUG ThreadId(01) outbound:accept{client.addr=10.83.74.97:45234}:server{orig_dst=172.20.222.45:80}:profile:http{v=h2}:logical{dst=umg-svc-p-es.staging.svc.cluster.local:80}:concrete{addr=umg-svc-p-es.staging.svc.cluster.local:80}:endpoint{server.addr=10.83.57.39:80}:h2:Connection{peer=Client}: h2::codec::framed_write: send frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(0) }```

#### linkerd check output

➜  ~ linkerd check                                                                                                                                                                       
Linkerd core checks
===================
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ control plane pods are ready
linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist
linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor
linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
linkerd-version
---------------
√ can determine the latest version
‼ cli is up-to-date
    is running version 21.4.5 but the latest edge version is 21.8.3
    see https://linkerd.io/checks/#l5d-version-cli for hints
control-plane-version
---------------------
√ can retrieve the control plane version
‼ control plane is up-to-date
    is running version 21.4.5 but the latest edge version is 21.8.3
    see https://linkerd.io/checks/#l5d-version-control for hints
√ control plane and cli versions match
linkerd-control-plane-proxy
---------------------------
√ control plane proxies are healthy
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
    * linkerd-destination-58dd86f9fb-4snvt (edge-21.4.5)
    * linkerd-destination-58dd86f9fb-wq9cl (edge-21.4.5)
    * linkerd-destination-58dd86f9fb-xxkw2 (edge-21.4.5)
    * linkerd-identity-d79cdbfbf-7h7z5 (edge-21.4.5)
    * linkerd-identity-d79cdbfbf-fnmdn (edge-21.4.5)
    * linkerd-identity-d79cdbfbf-w2vzl (edge-21.4.5)
    * linkerd-proxy-injector-d54f688f7-4hkpb (edge-21.4.5)
    * linkerd-proxy-injector-d54f688f7-p9q4r (edge-21.4.5)
    * linkerd-proxy-injector-d54f688f7-pq82k (edge-21.4.5)
    see https://linkerd.io/checks/#l5d-cp-proxy-version for hints
√ control plane proxies and cli versions match
linkerd-ha-checks
-----------------
‼ pod injection disabled on kube-system
    kube-system namespace needs to have the label config.linkerd.io/admission-webhooks: disabled if injector webhook failure policy is Fail
    see https://linkerd.io/checks/#l5d-injection-disabled for hints
√ multiple replicas of control plane pods
Status check results are √
Linkerd extensions checks
=========================
linkerd-buoyant
---------------
‼ Linkerd extension command linkerd-buoyant exists
    exec: "linkerd-buoyant": executable file not found in $PATH
    see https://linkerd.io/checks/#extensions for hints

Status check results are √

### Environment

- Kubernetes Version: v1.19.13-eks-8df270
- Cluster Environment: EKS
- Host OS: Amazon Linux
- Linkerd version: edge-21.4.5

### Possible solution

### Additional context
adleong commented 3 years ago

Hi @sumit-joshi-mt. Sorry for the delay in responding to this issue. Unfortunately, without a more concrete way to reproduce this issue, there's not much we can do to investigate. If you can provide a way to reproduce this issue (ideally a set of kubernetes manifests) then we'd be able to see the issue firsthand and investigate.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.