Closed kfirfer closed 11 months ago
The same thing happened to me
fyi
when I set nodelocaldns as HA it doesn't reproduce
enable_nodelocaldns_secondary: true
Edit: it is reproduced but less frequently
I have same error..
I have same error after restart cluster
I am also getting a similar problem.
I also encountered this problem (on kubespray 2.20).
My Fix
I found that setting resolvconf_mode: none
fixed it. I would recommend retrying on a fresh host though, I don't think the reset playbook cleans up resolvconf_mode.
I suspect you could also set upstream_dns_servers
to a non-empty list, if you know what you want that to be.
Explanation
Without upstream_dns_servers
set, both coredns and nodelocaldns 'fall back' to the host's DNS setting.
And with the default resolvconf_mode: host_resolvconf
, the host is told to use nodelocaldns/coredns.
So you have a loop: host -> nodelocaldns -> host -> nodelocaldns -> ...
This is what nodelocaldns is detecting. The fix is just to break either the 'host -> nodelocaldns' or 'nodelocaldns -> host' link of the loop.
Kubespray also applies resolvconf_mode: host_resolvconf
after it starts up, which it why it seems to work initially but fails after a cluster restart. The nodelocaldns check passed the first time since the loop wasn't configured yet,
Impact of fix
If you set resolvconf_mode: none
then you won't be able to access services by their domain names from the hosts. DNS resolution still works fine within the pods, but from the host itself you will have to use a service's IP address. I haven't needed to do DNS resolution of cluster services on the host, though. And if you did, I think upstream_dns_servers
would solve the problem just as well and keep host DNS working.
Relevant Files
https://github.com/kubernetes-sigs/kubespray/blob/release-2.20/roles/kubernetes-apps/ansible/templates/coredns-config.yml.j2#L55 https://github.com/kubernetes-sigs/kubespray/blob/release-2.20/roles/kubernetes-apps/ansible/templates/nodelocaldns-config.yml.j2#L83 https://github.com/kubernetes-sigs/kubespray/blob/release-2.20/roles/kubernetes-apps/ansible/tasks/nodelocaldns.yml#L63
Same issue here, it is quite misleading as issue appears only after node restart. Hopefully it will be fixed soon
Same exact issue here, happened after a node restart.
I also encountered this problem (on kubespray 2.20).
My Fix
I found that setting
resolvconf_mode: none
fixed it. I would recommend retrying on a fresh host though, I don't think the reset playbook cleans up resolvconf_mode.I suspect you could also set
upstream_dns_servers
to a non-empty list, if you know what you want that to be.Explanation
Without
upstream_dns_servers
set, both coredns and nodelocaldns 'fall back' to the host's DNS setting. And with the defaultresolvconf_mode: host_resolvconf
, the host is told to use nodelocaldns/coredns.So you have a loop: host -> nodelocaldns -> host -> nodelocaldns -> ...
This is what nodelocaldns is detecting. The fix is just to break either the 'host -> nodelocaldns' or 'nodelocaldns -> host' link of the loop.
Kubespray also applies
resolvconf_mode: host_resolvconf
after it starts up, which it why it seems to work initially but fails after a cluster restart. The nodelocaldns check passed the first time since the loop wasn't configured yet,Impact of fix
If you set
resolvconf_mode: none
then you won't be able to access services by their domain names from the hosts. DNS resolution still works fine within the pods, but from the host itself you will have to use a service's IP address. I haven't needed to do DNS resolution of cluster services on the host, though. And if you did, I thinkupstream_dns_servers
would solve the problem just as well and keep host DNS working.Relevant Files
https://github.com/kubernetes-sigs/kubespray/blob/release-2.20/roles/kubernetes-apps/ansible/templates/coredns-config.yml.j2#L55 https://github.com/kubernetes-sigs/kubespray/blob/release-2.20/roles/kubernetes-apps/ansible/templates/nodelocaldns-config.yml.j2#L83 https://github.com/kubernetes-sigs/kubespray/blob/release-2.20/roles/kubernetes-apps/ansible/tasks/nodelocaldns.yml#L63
Setting upstream_dns_servers helped for us, works with reanimating clusters which fail to start.
Try explicitly setting remove_default_searchdomains: false
, which is supposed to be the default but it seems it is not (which is a bug IMHO).
More accurately, you need to have Domains=default.svc.cluster.local svc.cluster.local
inside resolved configuration file (/etc/systemd/resolved.conf
by default).
I'll create a PR for it if it works for others too.
Note: as others also mentioned, apparently changes in resolved.conf is not read unless you restart the cluster or node.
In Jinja, if undefined, it does not match remove_default_searchdomains is sameas false
, and the default value is true.
{% if remove_default_searchdomains is sameas false or (remove_default_searchdomains is sameas true and searchdomains|default([])|length==0)%}
Domains={{ ([ 'default.svc.' + dns_domain, 'svc.' + dns_domain ] + searchdomains|default([])) | join(' ') }}
{% else %}
Domains={{ searchdomains|default([]) | join(' ') }}
{% endif %}
Hello, did anyone find a solution to this issue I am also facing the same issue on kubespray :- v2.23.1 kubernetes :- v1.28.3
the above setting didnot work for me
Hello, did anyone find a solution to this issue I am also facing the same issue on kubespray :- v2.23.1 kubernetes :- v1.28.3
the above setting didnot work for me
Have you tried to delete localnodedns to restart this pod after update above settings for k8s?
I also encountered this problem (on kubespray 2.20).
My Fix
I found that setting
resolvconf_mode: none
fixed it. I would recommend retrying on a fresh host though, I don't think the reset playbook cleans up resolvconf_mode.I suspect you could also set
upstream_dns_servers
to a non-empty list, if you know what you want that to be.Explanation
Without
upstream_dns_servers
set, both coredns and nodelocaldns 'fall back' to the host's DNS setting. And with the defaultresolvconf_mode: host_resolvconf
, the host is told to use nodelocaldns/coredns.So you have a loop: host -> nodelocaldns -> host -> nodelocaldns -> ...
This is what nodelocaldns is detecting. The fix is just to break either the 'host -> nodelocaldns' or 'nodelocaldns -> host' link of the loop.
Kubespray also applies
resolvconf_mode: host_resolvconf
after it starts up, which it why it seems to work initially but fails after a cluster restart. The nodelocaldns check passed the first time since the loop wasn't configured yet,Impact of fix
If you set
resolvconf_mode: none
then you won't be able to access services by their domain names from the hosts. DNS resolution still works fine within the pods, but from the host itself you will have to use a service's IP address. I haven't needed to do DNS resolution of cluster services on the host, though. And if you did, I thinkupstream_dns_servers
would solve the problem just as well and keep host DNS working.Relevant Files
https://github.com/kubernetes-sigs/kubespray/blob/release-2.20/roles/kubernetes-apps/ansible/templates/coredns-config.yml.j2#L55 https://github.com/kubernetes-sigs/kubespray/blob/release-2.20/roles/kubernetes-apps/ansible/templates/nodelocaldns-config.yml.j2#L83 https://github.com/kubernetes-sigs/kubespray/blob/release-2.20/roles/kubernetes-apps/ansible/tasks/nodelocaldns.yml#L63
The following steps worked for me
resolvconf_mode: none
and remove_default_searchdomains: false
A temporary quickfix for urgent:
.53
zone from forward . /etc/resolv.conf
line to forward . 1.1.1.1
Hello
Sometimes the localnodedns pods getting into loop and crashing due to memory overload The logs in nodelocaldns says to troubleshoot it through: https://coredns.io/plugins/loop/#troubleshooting
in short: I have installed the cluster on 3 nodes and didnt touched configurations regarding localnodedns (default config) except requests&limits
Environment:
Cloud provider or hardware configuration:
OS Linux 5.15.0-67-generic x86_64 VERSION="22.04.2 LTS (Jammy Jellyfish)"
Version of Ansible ansible [core 2.12.5]
Version of Python Python 3.10.4
Kubespray version: 0c4f57a09
Network plugin used: calico
hosts.yml:
Command used to invoke ansible:
k8s-cluster.yaml