Open nikita-nazemtsev opened 1 month ago
Hi, Have got the same issue while upgrading Kops from 1.28 to 1.29. This is quite a critical bug/regression in Kops 1.29 which blocks us from upgrading. Any known workarounds for this?
/reopen
I wasn't able to repro the issue but I did upgrade Cilium to the latest 1.15 patch version. If you're able to build kops from source, can you build the kops CLI from this branch, run kops update cluster --yes
and see if the issue is fixed?
@rifelpet: Reopened this issue.
I recreated the cluster using kops from a branch, but it didn't solve the issue.
I'm not sure whether it's connected or not but except of the nodelocaldns also not working I have an experimental IPv6-only cluster with cilium.
I've tried upgrading it from kops v1.28 to v1.29 but the endpoints
in cilium are unreachable on nodes.
I've looked what's changed in cilium setup and I found that hostNetwork: true
was added to both cilium-operator and cilium DaemonSet. I suspect that it's somehow connected with both issues but I couldn't find exact issue.
/kind bug
1. What
kops
version are you running? The commandkops version
, will display this information.Client version: 1.29.0 (git-v1.29.0)
2. What Kubernetes version are you running?
kubectl version
will print the version if a cluster is running or provide the Kubernetes version specified as akops
flag.1.28.7
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue? Update Kops from 1.28.4 to 1.29.0, or create a new cluster using Kops 1.29.0 with Node Local DNS and Cilium CNI.
5. What happened after the commands executed? Pods on updated nodes cannot access node-local-dns pods
6. What did you expect to happen? Pods can access node-local-dns pods.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest. You may want to remove your cluster name and other sensitive information.8. Please run the commands with most verbose logging by adding the
-v 10
flag. Paste the logs into this report, or in a gist and provide the gist link here.9. Anything else do we need to know?
We found a workaround to fix this issue on a single node: We noticed that the nodelocaldns interface is in a down state on nodes. ( but the same we can observe on older kops versions where node-local-dns works fine) But after executing
ip link set dev nodelocaldns up
Nodelocaldns interface: In cilium agent logs on this node we can see:time="2024-05-31T09:23:08Z" level=info msg="Node addresses updated" device=nodelocaldns node-addresses="169.254.20.10 (nodelocaldns)" subsys=node-address
After these actions, all pods on this node can access node-local-dns without any problems.