Open dermoth opened 4 years ago
cc @EricLuehrsen
I'm seeing this in 19.07.2 as well, for example just for ~5 minutes of wan6 being down:
root@OpenWrt:/tmp/log# cat messages |grep unbound|wc -l
740628
root@OpenWrt:/tmp/log# cat messages |grep unbound|head
May 9 13:41:12 OpenWrt unbound: default protocol configuration
May 9 13:41:12 OpenWrt unbound: default memory configuration
May 9 13:41:12 OpenWrt unbound: default recursion configuration
May 9 13:41:17 OpenWrt unbound: [3752:0] notice: init module 0: iterator
May 9 13:41:17 OpenWrt unbound: [3752:0] info: start of service (unbound 1.9.6).
May 9 13:41:17 OpenWrt unbound: [3752:0] notice: sendto failed: Permission denied
May 9 13:41:17 OpenWrt unbound: [3752:0] notice: remote address is 2001:500:2f::f port 53
May 9 13:41:17 OpenWrt unbound: [3752:0] notice: sendto failed: Permission denied
May 9 13:41:17 OpenWrt unbound: [3752:0] notice: remote address is 2001:500:2::c port 53
May 9 13:41:17 OpenWrt unbound: [3752:0] notice: sendto failed: Permission denied
root@OpenWrt:/tmp/log# cat messages |grep unbound|tail
May 9 13:46:52 OpenWrt unbound: [4620:0] notice: sendto failed: Permission denied
May 9 13:46:52 OpenWrt unbound: [4620:0] notice: remote address is 2600:9000:5303:2b00::1 port 53
May 9 13:46:52 OpenWrt unbound: [4620:0] notice: sendto failed: Permission denied
May 9 13:46:52 OpenWrt unbound: [4620:0] notice: remote address is 2600:9000:5306:f000::1 port 53
May 9 13:46:52 OpenWrt unbound: [4620:0] notice: sendto failed: Permission denied
May 9 13:46:52 OpenWrt unbound: [4620:0] notice: remote address is 2600:9000:5306:f000::1 port 53
May 9 13:46:52 OpenWrt unbound: [4620:0] notice: sendto failed: Permission denied
May 9 13:46:52 OpenWrt unbound: [4620:0] notice: remote address is 2600:9000:5302:d300::1 port 53
root@OpenWrt:/tmp/log# du -ms messages
62 messages
It looks like this might have been fixed upstream in 1.10; see https://github.com/NLnetLabs/unbound/issues/35 / https://github.com/NLnetLabs/unbound/commit/474afc9016d34a98537a97cc94e14d329c7d8aeb.
Yes it has been there some time. Unbound has burped out these IP6 connection events when initializing root server connections. Any Unbound restart reason and any root server infrastructure refresh can appear to pop these. It doesn't require the auth-zone:
clause either.
You may also be experiencing interface restart flood from netifd. It may help to use UCI triggers to restrict which networks restart Unbound, list trigger_interface...
In my case I get plenty of these just after a reboot; around 5MB in total before wan6 comes online.
Is it possible to backport this fix and/or upgrade the Unbound package to 1.10 locally?
I've not contributed before but I'm happy to try to help if I can.
See if #12141 has resolved this.
Unfortunately the issue still seems to persist. I upgraded all the unbound packages (opkg list-upgradable|grep unbound|cut -d' ' -f1|xargs opkg upgrade
), verified 1.10 was installed, restarted unbound a couple times, but still when I issue a service network restart
the logs are filled with the same "Permission denied" messages until wan6
comes up.
I haven't had time to upgrade to 19.07 and check on my end... Could you confirm the unbound version though? try:
/proc/$(pgrep unbound)/exe -V
This will l make sure you're checking the currently running version in case it's somehow starting an older binary.
Regards
e been fixed upstream in 1.10; see NLnetLabs/unbound#35 / NLnetLabs/unbound@474afc9.
FYI That commit was already in the 1.9.6 release... https://github.com/NLnetLabs/unbound/blob/release-1.9.6/doc/Changelog#L449
I just experienced this on 18.06.. an update to unbound to 1.10.0 AND updating libunbound eventually fixed my problem... once I realized the old unbound was still running and printing the same spam to the logs. Make sure you only have a single PID running after the upgrade (or none, if you stop/start it).
Also worth noting uci set unbound.@unbound[0].protocol='ip4_only'
appears necessary to stop ipv6 usage. ipv6 is not enabled on my network.
I was under the impression this issue is only affecting ipv6-enabled resolvers - is it possible your old process also didn't have ipv4_only set?
Unfortunately it looks like I am running the new process, but I'm still seeing the issue. I could try disabling ipv6 later to see if that helps, which yes I assume it would...
# ps|grep unbound
3952 root 1080 S grep unbound
18094 unbound 31024 S /usr/sbin/unbound -d -c /var/lib/unbound/unbound.conf
# /proc/18094/exe -V
Version 1.10.0
Configure line: --target=x86_64-openwrt-linux --host=x86_64-openwrt-linux --build=x86_64-pc-linux-gnu --program-prefix= --program-suffix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --libexecdir=/usr/lib --sysconfdir=/etc --datadir=/usr/share --localstatedir=/var --mandir=/usr/man --infodir=/usr/info --disable-nls --disable-dsa --disable-gost --enable-allsymbols --enable-ecdsa --enable-tfo-client --enable-tfo-server --with-libexpat=/builder/shared-workdir/build/sdk/staging_dir/target-x86_64_musl/usr --with-ssl=/builder/shared-workdir/build/sdk/staging_dir/target-x86_64_musl/usr --with-user=unbound --with-run-dir=/var/lib/unbound --with-conf-file=/var/lib/unbound/unbound.conf --with-pidfile=/var/run/unbound.pid --with-pthreads --with-libevent=/builder/shared-workdir/build/sdk/staging_dir/target-x86_64_musl/usr --enable-event-api
Linked libs: pluggable-libevent 2.1.11-stable (it uses epoll), OpenSSL 1.1.1g 21 Apr 2020
Linked modules: dns64 respip validator iterator
TCP Fastopen feature available
BSD licensed, see LICENSE in source package for details.
Report bugs to unbound-bugs@nlnetlabs.nl or https://github.com/NLnetLabs/unbound/issues
Just confirming that setting protocol='ip4_only'
does avoid the log spam for me.
@dermoth, would you consider your issue closed then?
From what I can tell, no. I disabled logging for now as I didn't have time to upgrade. The upstream patch that was reportedly fixing this is already in the current release I'm using, so it's unlikely the upgrade will help.
One option would be disabling logging by default. Or I could try compiling latest package on 18.06 and try reproducing it. FWIW the upstream bug is still open...
I would agree, it is not fixed. Ideally the DNS resolver would not need to be prohibited from using IPv6. Not that I have any idea what the issue is now. :slightly_frowning_face:
(1) if you configure the server for IPv6, and it doesn't have an IPv6 route, then its attempts to make IPv6 connections rebound and get logged. If WAN and WAN6 have unusual delays between them, then this can naturally happen. If you don't have IPv6, then you need to disable it in Unbound. (2) Unbound was attempting to contact root servers anyway possible regardless of config when it started, and this at least it seems is cleaned up. Conclusion: as a derivative package, there is nothing more to be done in within OpenWrt. This needs a verified upstream fix. Its more than an easy patch. Unbound fundamentally uses all interfaces provided, and this issue stems from there.
Note. My tip way up was incomplete. _"You may also be experiencing interface restart flood from netifd
. It may help to use UCI triggers to restrict which networks restart Unbound, list trigger_interface...
"_ Something was done a few years ago to ifup
events between netifd
and procd
. IPv6 minor data refresh from your ISP will cause procd
to issue an interface up trigger to applications it controls, even every 2 minutes. You need to remove WAN6 from Unbound triggers. You can search core OpenWrt fly spray for more.
If Unbound flooded the logs between the time ipv4 and ipv6 went up it probably went unnoticed. I noticed the issue during an outage and if ti lasted without any fix my disk would've probably filled up. FWIW I use 6in4 tunneling.
If you don't want to keep the openwrt bug open while there is an open upstream bug (I opened both...) I will reopen one when the issue is fixed upstream.
Maintainer: Eric Luehrsen ericluehrsen@gmail.com Environment: OpenWrt 18.06.1 (arm_cortex-a9_vfpv3 on WRT3200ACM), unbound-1.9.6-1
Description:
Today I had a strange network glitch where pppd and unbound restarted without any obvious reason (normally I see something about pppd exiting before it starts reconnecting.... then unbound (re?)started shortly after (7 seconds after the first pppd entry and 1 sec after the first pppd failure and retry).
While this could be an issue of its own, I wouldn't normally mind if it wasn't for the fact after this sequence of events OpenWRT started streaming 30mbps of syslog messages back at my workstation. Between 13:15:00 and 13:18:59 I have received 178MB of logs from my router, peaking at 68MB/minute when I grep 100% affected timeframe.
If that was an extended outage where I'm not aware of the issue that logging rate would have eventually filled up my workstation.
I'm not sure if this bug should be sent upspream, but at the very least I think OpenWRT should default to lowest verbosity level to avoid excessive logging by the server.I reported this bug upstream, https://github.com/NLnetLabs/unbound/issues/224, yet pending resolution or rejection of the upstream bug we might consider lowering the verbosity level.Here is an excerpt of the log, including all lines matching kernel, pppd, netifd and unbound:
After that it kept going on while pppd was trying to reconnect... There is a small gap while I manually restarted unbound, then I stopped it to stop the spam (so I could see what was going on).
pppd reconnected shortly after the event and then I rebooted the router to make sure everything gets back to normal.