Open nikita-nazemtsev opened 5 months ago
Hi, Have got the same issue while upgrading Kops from 1.28 to 1.29. This is quite a critical bug/regression in Kops 1.29 which blocks us from upgrading. Any known workarounds for this?
/reopen
I wasn't able to repro the issue but I did upgrade Cilium to the latest 1.15 patch version. If you're able to build kops from source, can you build the kops CLI from this branch, run kops update cluster --yes
and see if the issue is fixed?
@rifelpet: Reopened this issue.
I recreated the cluster using kops from a branch, but it didn't solve the issue.
I'm not sure whether it's connected or not but except of the nodelocaldns also not working I have an experimental IPv6-only cluster with cilium.
I've tried upgrading it from kops v1.28 to v1.29 but the endpoints
in cilium are unreachable on nodes.
I've looked what's changed in cilium setup and I found that hostNetwork: true
was added to both cilium-operator and cilium DaemonSet. I suspect that it's somehow connected with both issues but I couldn't find exact issue.
Is kube-dns
service created in kube-system
?
Hi, yes, kube-dns service is created in kube-system. Also, there is a cilium doc on how to configure node-local-dns with cilium https://docs.cilium.io/en/v1.10/gettingstarted/local-redirect-policy/#node-local-dns-cache One interesting part is that node-local-dns must run as regular pod with hostNetwork: false, what not is the case in current Kops deployment. Also, CiliumLocalRedirectPolicy must be added. Took this from this issue: https://github.com/cilium/cilium/issues/16906
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/kind bug
1. What
kops
version are you running? The commandkops version
, will display this information.Client version: 1.29.0 (git-v1.29.0)
2. What Kubernetes version are you running?
kubectl version
will print the version if a cluster is running or provide the Kubernetes version specified as akops
flag.1.28.7
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue? Update Kops from 1.28.4 to 1.29.0, or create a new cluster using Kops 1.29.0 with Node Local DNS and Cilium CNI.
5. What happened after the commands executed? Pods on updated nodes cannot access node-local-dns pods
6. What did you expect to happen? Pods can access node-local-dns pods.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest. You may want to remove your cluster name and other sensitive information.8. Please run the commands with most verbose logging by adding the
-v 10
flag. Paste the logs into this report, or in a gist and provide the gist link here.9. Anything else do we need to know?
We found a workaround to fix this issue on a single node: We noticed that the nodelocaldns interface is in a down state on nodes. ( but the same we can observe on older kops versions where node-local-dns works fine) But after executing
ip link set dev nodelocaldns up
Nodelocaldns interface: In cilium agent logs on this node we can see:time="2024-05-31T09:23:08Z" level=info msg="Node addresses updated" device=nodelocaldns node-addresses="169.254.20.10 (nodelocaldns)" subsys=node-address
After these actions, all pods on this node can access node-local-dns without any problems.