linkerd / website

Source code for the linkerd.io website
Apache License 2.0
44 stars 212 forks source link

Proxy log connection reset by peer and connection refused #617

Open jonasdkhansen opened 5 years ago

jonasdkhansen commented 5 years ago

Bug Report

What is the issue?

I see a lot of connection reset by peer errors, from the proxy log, on all my services. I'm kind of stuck in finding the issue, so i hope someone can point me in the right direction. It dosen't seem to affect the traffic, but the success rate in Linkerd is sometimes down a few percent.

How can it be reproduced?

No i am not able to reproduce the issue, by doing a lot of requests to the service. The problem only shows in the logs.

Logs, error output, etc

ERR! [ 11082.146455s] proxy={server=in listen=0.0.0.0:4143 remote=bla.bla.bla:35314} linkerd2_proxy::app::errors unexpected error: connection error: Connection reset by peer (os error 104)

And this one:

ERR! [ 87.959649s] proxy={server=in listen=0.0.0.0:4143 remote=bla.bla.bla:50416} linkerd2_proxy::app::errors unexpected error: error trying to connect: Connection refused (os error 111) (address: 127.0.0.1:8080)

linkerd check output

kubernetes-api


√ can initialize the client
√ can query the Kubernetes API

kubernetes-version


√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-config


√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist

linkerd-existence

√ 'linkerd-config' config map exists √ control plane replica sets are ready √ no unschedulable pods √ controller pod is running √ can initialize the client √ can query the control plane API

linkerd-api

√ control plane pods are ready √ control plane self-check √ [kubernetes] control plane can talk to Kubernetes √ [prometheus] control plane can talk to Prometheus √ no invalid service profiles

linkerd-version

√ can determine the latest version ‼ cli is up-to-date is running version 2.4.0 but the latest stable version is 2.5.0 see https://linkerd.io/checks/#l5d-version-cli for hints

control-plane-version

‼ control plane is up-to-date is running version 2.4.0 but the latest stable version is 2.5.0 see https://linkerd.io/checks/#l5d-version-control for hints √ control plane and cli versions match

Status check results are √

Environment

grampelberg commented 5 years ago

Connection reset by peer is pretty benign, it just means that the remote connection was closed. If you're not seeing any issues in your application it should be fine to ignore.

Connection refused is a little more concerning. If you're not using readiness probes, it suggests that a new pod started up and began receiving traffic before it was ready. Otherwise, there can be some staleness in the discovery for endpoints from your api-server. I've not seen any issues around that with GKE though ...

If you're particularly concerned, inject your workloads with --enable-debug-sidecar and watch the logs. That'll do tshark and provide some added insight into what's happening.

wmorgan commented 5 years ago

We need to add this information to the docs somewhere

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

xiaowheat commented 4 years ago

Our Kubernetes cluster also get the same error: Connection reset by peer (os error 104)

Linkerd Grafana shows the success rate is sometimes down a few percent.

08E5F32C-C984-4A38-92C1-5E02F624C605

Is there a way to fix the error, or just ignore it?

cpretzer commented 4 years ago

Thanks for the message @xiaowheat

Have a look at the logs of the service that Linkerd is proxying connections for. If you see errors there, then we can explore those. If there are no errors or unexpected behavior from the service, then you can probably ignore the Connection reset by peer errors