Closed plord12 closed 3 months ago
That indeed sounds like an issue in the latest CoreDNS based DNS plug-in. The update was entirely version bumps (Alpine 3.19 and CoreDNS 1.8.7).
Anything specific in the DNS plug-in logs during the high CPU usage periode?
Looks like adguard DNS is slow to start on boot ... during that time I see lots of (with varying addresses) :
[INFO] 172.30.32.1:42679 - 54849 "A IN api.forecast.solar. udp 36 false 512" - - 0 6.002127419s
[ERROR] plugin/errors: 2 api.forecast.solar. A: read udp 172.30.32.3:57283->192.168.175.10:53: i/o timeout
Oh, early in the dns logs I also see -
services-up: info: copying legacy longrun coredns (no readiness notification)
s6-rc: info: service legacy-services successfully started
[ERROR] plugin/mdns: could not locate primary interface due to: Lookup failed due to system error: Connection refused
It it easy to rollback to the ealier dns version to test ?
Yes you can use ha dns update --version 2023.06.2
.
Meanwhile I tried to recreate your setup, but I have not observed elevated CPU load yet. However, I only set servers to the Home Assistant installation, I haven't reconfigured my DHCP (and hence locals is still my router).
Yes you can use
ha dns update --version 2023.06.2
.
Thanks - I've not had to do that before.
However, even after this I still saw a case of high cpu and server eventually rebooting. So I'm wondering if there is a combination here.
Note that this is tough to debug ... I need the server running plus there is only a small window to run commands before the server reboots.
To keep me running, I've updated home assistant to reference my secondary DNS server (external to home assistant) so during boot hassio_dns isn't trying to query adguard.
Aside from possible hassio_dns bugs, I'm wondering if this configuration is a bad idea since there will always be a startup ordering issue. Maybe :
However, even after this I still saw a case of high cpu and server eventually rebooting. So I'm wondering if there is a combination here.
Thanks for confirming! So it seems this just coincided somehow? Did you do other changes to your system?
Note that there are also other issues reporting high CPU usage, e.g. #124.
Aside from possible hassio_dns bugs, I'm wondering if this configuration is a bad idea since there will always be a startup ordering issue. Maybe :
Yeah a system being it's own primary DNS server is generally not ideal indeed. But it is a popular setup, and afaik it works for other folks.
I just checked my local test installation with AdGuard, and now noticing elevated CPU usage as well. Looking at logs it seems it entered a loop of PTR requests:
Mar 13 09:05:05 ha-shelf3-odroid-c2 addon_a0d7b954_adguard[401]: 2024/03/13 10:05:05.023703 [error] dnsproxy: 172.30.32.3:53: response received over udp: "exchanging with 172.30.32.3:53 over udp: read udp 172.30.32.1:57046->172.30.32.3:53: i/o timeout"
Mar 13 09:05:05 ha-shelf3-odroid-c2 addon_a0d7b954_adguard[401]: 2024/03/13 10:05:05.023733 [error] dnsproxy: upstream 172.30.32.3:53 failed to exchange ;4.d.e.0.0.0.0.0.0.0.0.0.0.0.0.0.7.0.0.0.c.e.5.9.c.a.a.8.a.7.d.f.ip6.arpa. IN PTR in 2.002714299s: exchanging with 172.30.32.3:53 over udp: read udp 172.30.32.1:57046->172.30.32.3:53: i/o timeout
Mar 13 09:05:05 ha-shelf3-odroid-c2 hassio_dns[401]: [INFO] 127.0.0.1:48246 - 26210 "PTR IN 4.d.e.0.0.0.0.0.0.0.0.0.0.0.0.0.7.0.0.0.c.e.5.9.c.a.a.8.a.7.d.f.ip6.arpa. udp 101 true 2048" NXDOMAIN qr,rd,ra 90 0.00516157s
Mar 13 09:05:05 ha-shelf3-odroid-c2 hassio_dns[401]: [INFO] 172.30.32.1:51696 - 16043 "PTR IN 4.d.e.0.0.0.0.0.0.0.0.0.0.0.0.0.7.0.0.0.c.e.5.9.c.a.a.8.a.7.d.f.ip6.arpa. udp 101 true 2048" NXDOMAIN qr,rd,ra 90 6.012442759s
It seems to me the two DNS servers are referencing each other? :thinking: Indeed, I see 172.30.32.3:53
as upstream DNS server, which is CoreDNS. And since CoreDNS is pointing to the local system too, it seems my system entered a DNS loop :cry:
It seems to me the two DNS servers are referencing each other? 🤔 Indeed, I see
172.30.32.3:53
as upstream DNS server, which is CoreDNS. And since CoreDNS is pointing to the local system too, it seems my system entered a DNS loop 😢
Ah, thanks for replicating. Glad I'm not alone !
It seems that CoreDNS got picked up from the local /etc/resolv.conf
. We set the IP address of the DNS plug-in (CoreDNS) as default name server for each add-on. That is also true for AdGuard, hence that potential of a loop.
Under Settings -> DNS Settings I have the default Quad 9 http server as upstream DNS server. But "Use private reverse DNS resolvers" is checked, and under "Private reverse DNS servers" the following is listed:
By default, AdGuard Home uses the following reverse DNS resolvers: "172.30.32.3:53".
It seem AdGuard automatically picked up the DNS server from /etc/resolv.conf
, which caused the DNS loop.
Disabling "Use private reverse DNS resolvers" seems to fix the loop in my setup.
I've now set my router as private reverse DNS resolver, that should work too (as long as you don't set the Home Assistant machine adding the AdGuard add-on as upstream on the router as well :sweat_smile: .
Reading the documentation of AdGuard, it says:
Ensure your Home Assistant device has a static IP and static external DNS servers! This is important! You WILL end up having issues if you skip this step.
:man_shrugging:
Disabling "Use private reverse DNS resolvers" seems to fix the loop in my setup.
Brilliant ! Works for me. I'm not have on the latest versions and no high cpu / reboots.
I'll raise an issue against the adguard plugin about Disabling "Use private reverse DNS resolvers".
I've opened a thread in HA community https://community.home-assistant.io/t/dns-not-running-weird-messages-in-protocols-error-relocation-bin-bash/703837
My assumption is that aarch64-hassio-dns:2023.03.0 image is not using the correct architecture for its binaries to run on a Odroid device.
The same seems to apply for ghcr.io/home-assistant/odroid-n2-homeassistant:2024.3.0 image as well.
@sarmbruster your case seems to have a different root cause, and it seems to me that is limited to your instance. Therefor i don't think it belongs to the issue tracker here. I've responded on the community thread.
Also had crazy high CPU usage etc.
Resolved issue by ssh'ing into HA and running:
ha dns options --fallback=false
As the OP's problem is caused by the AdGuard Home add-on, I am closing this issue.
@Tahutipai your issue is tracked by #90.
I've just upgraded to 2024.03.0 and found although home assistant starts, after a short time cpu usage goes to 100% and then reboots.
I'm running -
Now one thing I noticed (just before the server rebooted itself) is very high cpu usage for dns and adguard containers -
I use adguard for both a DHCP and filtering DNS server, it also resolves hosts on the local network. So I configure homeassistant to use the adguard DNS server (so that scripts etc can use hostnames) -
If I change the home assistant servers to be a pulic DNS server -
Then the high cpu usage and reboot goes away ... however this means that many scripts fail since local network hostnames are not resolved.
DNS resolution on the local network (to adguard) works as expected BTW.
Looks like a new issue with 2024.03.0 to me.
Any thoughts ? I guess I should also post to plugin-dns github issues.
(also posted to https://community.home-assistant.io/t/startup-failure-after-2024-03-0-upgrade-dns-issues/703118)