Closed patrickdomnick closed 6 months ago
Hi @patrickdomnick! Are you able to look at the kubelet logs at all to see why the CNI plugin itself might be failing? Both VPC & Linkerd CNI logs you posted above are for the installers. It's likely that whatever error is being encountered is in the plugin executable. It might help us have a better idea of why the sandbox can't be created.
Hello @mateiidavid, I am @patrickdomnick 's Co-Worker, and I managed to fix the problem in his absence 🥳 The LinkerD-CNI Logs told us that there was an Issue when the LinkerD-CNI Executable on the Node itself failed when calling the K8s-API to get the current Pod's Information. It failed with a "403 FORBIDDEN", which we assumed to be the result of incorrect K8s Credentials, but testing the Kubeconfig on a Pod showed us that the Config was valid. After a deep deep dive including writing a Wrapper for the CNI-Executable, we noticed that we were missing a NO_PROXY entry on our K8s-Node for the K8s-API - the "403 FORBIDDEN" came from our Squid Proxy, not from K8s.
We will fix this by adding the entry on our Nodes. It would be sweet if it were possible to set Env-Vars in the Helmfile, which the LInkerD-CNI Pod would pass on to the LinkerD-CNI Executable on the Node, if possible.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
What is the issue?
When using Linkerd in CNI mode with EKS 1.26 and the VPC CNI, the linkerd control plane is not able to start. This issue is present for all LinkerD Pods.
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "20ecb2f55eed7f8f826624c5f722b879dd9a76a03862d71bc5606724b39ef36a": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Get "https://[172.20.0.1]:443/api/v1/namespaces/linkerd/pods/linkerd-identity-784744bbd9-fz4rb": Forbidden
Other similar issues hinted towards problems that are already fixed or do not apply to use like the
/etc/cni/net.d/10-aws.conflist
which is correctly chained in our case?How can it be reproduced?
Install Linkerd with CNI Mode Enabled on a EKS 1.26 Cluster with the VPC CNI enabled:
Logs, error output, etc
VPC CNI
Linkerd CNI
output of
linkerd check -o short
does not finish...
Environment
Possible solution
No response
Additional context
We worked with this setup before (EKS 1.23) and are now directly upgrading to EKS 1.26. We did not change any of the VPC CNI Configuration since then, so I am assuming it might has to do something with the switch to containerd or some new defaults which we are not aware of.
Would you like to work on fixing this bug?
None