linkerd / linkerd2

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.
https://linkerd.io
Apache License 2.0
10.69k stars 1.28k forks source link

Linkerd doesn't work in an EKS cluster with Weave-Net installed #3353

Closed jwenz723 closed 5 years ago

jwenz723 commented 5 years ago

Bug Report

What is the issue?

It appears that Linkerd is unable to function properly with the default installation (linkerd install | kubectl apply -f -) when running in an EKS cluster with Weave-net CNI running.

The problem I am experiencing is that when I turn on injection by setting the linkerd.io/inject: enabled annotation the injection fails to occur. My pods are restarted, but the injection doesn't actually happen.

How can it be reproduced?

eksctl create cluster
... wait for cluster to be created

# remove default CNI
kubectl delete ds -n kube-system aws-node

# install weave-net CNI
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

# at this point you need to terminate the ec2 worker node instances to 
# trigger new ones to be created to make use of weave-net CNI

# install linkerd
linkerd install | kubectl apply -f -

# install a deployment with injection turned on
linkerd inject deployment.yaml | kubectl apply -f -

# Open the linkerd dashboard
linkerd dashboard &

# browse to the deployment that was just installed and 
# you will see the 'meshed' status as 0 pods being meshed

Logs, error output, etc

I see the following in my API server logs (note: these are not in order by timestamp):

E0830 16:54:37.736044 8 controller.go:111] loading OpenAPI spec for "v1alpha1.tap.linkerd.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]

E0830 16:54:34.740387 8 available_controller.go:316] v1alpha1.tap.linkerd.io failed with: Get https://10.32.0.2:8089: Address is not allowed

W0830 16:54:15.269939 8 dispatcher.go:68] Failed calling webhook, failing open linkerd-proxy-injector.linkerd.io: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

E0830 16:54:15.269993 8 dispatcher.go:69] failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

W0830 16:19:45.563890 8 dispatcher.go:68] Failed calling webhook, failing open linkerd-proxy-injector.linkerd.io: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: no endpoints available for service "linkerd-proxy-injector"

E0830 16:20:54.693839 8 available_controller.go:316] v1alpha1.tap.linkerd.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io "v1alpha1.tap.linkerd.io": the object has been modified; please apply your changes to the latest version and try again

E0830 16:14:33.928098 8 dispatcher.go:69] failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)

I0830 16:55:19.544246 8 log.go:172] http2: server: error reading preface from client 73.98.224.141:54827: read tcp 10.0.38.129:443->73.98.224.141:54827: read: connection reset by peer

E0830 16:20:54.693839 8 available_controller.go:316] v1alpha1.tap.linkerd.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io "v1alpha1.tap.linkerd.io": the object has been modified; please apply your changes to the latest version and try again

linkerd check output

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API

linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ no invalid service profiles

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match

Status check results are √

Environment

Possible solution

Remove weave-net, or some how get the weave-net overlay network installed on the EKS managed master nodes.

Additional context

jwenz723 commented 5 years ago

Using manual injection to get the a deployment meshed does work:

linkerd inject --manual deployment.yaml | k apply -f -

However, it appears that there is certain functionality, like tap, that just doesn't work at all.

grampelberg commented 5 years ago

This is specifically a problem with EKS. As the api-servers are on a separate subnet and not part of the weave overlay, they're unable to connect to services running on the cluster. This is primarily difficult for the injector as the webhook call must hit the cluster. With the move to an APIService, tap also will not work.

Note: this is not Linkerd specific, as any webhook will not work and metrics-server will not operate either.