Closed douglmil closed 3 years ago
Hi @douglmil. I haven't tried running Linkerd with Calico on EKS but based on the networking restrictions that you describe, it's not too surprising that this doesn't work. Linkerd relies heavily on webhooks to do things like proxy injection and to serve the tap API. Linkerd also prohibits running in hostNetwork mode because proxy injection involves modifying the iptables for the pod.
Unfortunately, I don't really see a way around this until that limitation of EKS's custom networking support can be lifted.
hi @adleong, Do you know if there are other CNI's such as cillium that allow for intergration with linkerd?
Any CNI that doesn't impose the networking restriction that prevents the control plane from connecting to pods should work.
@adleong, Any particular recommendations?
Sorry, I don't personally have any CNI recommendations.
Hitting the same issue sadly. https://linkerd.slack.com/archives/C89RTCWJF/p1614097262059300
This really comes down to limitations imposed by AWS EKS. It stems from their noddy default aws-cni implementation that limits pod density based on instance ENI's. I know the arguments, that it simplifies the networking aspect and ties into AWS's in house systems (for up sale) but in my opinion it goes against what Kubernetes stands for such as high density usage. So we turn to external CNI's but AWS does not allow them to cover the master control plane nodes, which then causes issues such as this. :-(
Bug Report
What is the issue?
Linkerd pod injection does not work when installed along with Calico in EKS.
How can it be reproduced?
Deploy Linkerd in an EKS cluster running with a default install of calico cni.
Logs, error output, etc
linkderd check: FailedDiscoveryCheck: failing or missing response from https://192.168.200.144:8089/apis/tap.linkerd.io/v1alpha1: Get https://192.168.200.144:8089/apis/tap.linkerd.io/v1alpha1: Address is not allowed
I see the same error is the kubernetes api-server log in eks
linkerd check
output√ can initialize the client √ can query the Kubernetes API
kubernetes-version
√ is running the minimum Kubernetes API version √ is running the minimum kubectl version
linkerd-existence
√ 'linkerd-config' config map exists √ heartbeat ServiceAccount exist √ control plane replica sets are ready √ no unschedulable pods √ controller pod is running √ can initialize the client √ can query the control plane API
linkerd-config
√ control plane Namespace exists √ control plane ClusterRoles exist √ control plane ClusterRoleBindings exist √ control plane ServiceAccounts exist √ control plane CustomResourceDefinitions exist √ control plane MutatingWebhookConfigurations exist √ control plane ValidatingWebhookConfigurations exist √ control plane PodSecurityPolicies exist
linkerd-identity
√ certificate config is valid √ trust anchors are using supported crypto algorithm √ trust anchors are within their validity period √ trust anchors are valid for at least 60 days √ issuer cert is using supported crypto algorithm √ issuer cert is within its validity period √ issuer cert is valid for at least 60 days √ issuer cert is issued by the trust anchor
linkerd-webhooks-and-apisvc-tls
√ tap API server has valid cert √ tap API server cert is valid for at least 60 days √ proxy-injector webhook has valid cert √ proxy-injector cert is valid for at least 60 days √ sp-validator webhook has valid cert √ sp-validator cert is valid for at least 60 days
linkerd-api
√ control plane pods are ready √ control plane self-check √ [kubernetes] control plane can talk to Kubernetes √ [prometheus] control plane can talk to Prometheus ‼ tap api service is running FailedDiscoveryCheck: failing or missing response from https://192.168.200.144:8089/apis/tap.linkerd.io/v1alpha1: Get https://192.168.200.144:8089/apis/tap.linkerd.io/v1alpha1: Address is not allowed see https://linkerd.io/checks/#l5d-tap-api for hints
linkerd-version
√ can determine the latest version ‼ cli is up-to-date is running version 2.9.1 but the latest stable version is 2.9.2 see https://linkerd.io/checks/#l5d-version-cli for hints
control-plane-version
‼ control plane is up-to-date is running version 2.9.1 but the latest stable version is 2.9.2 see https://linkerd.io/checks/#l5d-version-control for hints √ control plane and cli versions match
linkerd-prometheus
√ prometheus add-on service account exists √ prometheus add-on config map exists √ prometheus pod is running
linkerd-grafana
√ grafana add-on service account exists √ grafana add-on config map exists √ grafana pod is running
Status check results are √
Environment
Possible solution
Need help finding one
Additional context
I am trying to run linkerd along with calico on an EKS deployment in AWS. I am not able to get pod injection functioning seemingly due to failed communication with the kubernetes api-server (the linkerd tap pod seems to be the one having the issue). I understand this is likely due to a disconnect between calico on the worker nodes and the aws cni running on the master node. The work around for other pods has been the enable host networking to allow api-server communication. I have not been able to deploy linkerd with hostnetwork=true. Should this be possible? If not does anyone no if a way to get EKS + calico + linkerd working?
Note from calico site on need for hostnetwork on eks: Calico networking cannot currently be installed on the EKS control plane nodes. As a result the control plane nodes will not be able to initiate network connections to Calico pods. (This is a general limitation of EKS’s custom networking support, not specific to Calico.) As a workaround, trusted pods that require control plane nodes to connect to them, such as those implementing admission controller webhooks, can include hostNetwork:true in their pod spec. See the Kuberentes API pod spec definition for more information on this setting.