opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.36k stars 754 forks source link

[BUG] DNS Resolver seems to be not started after boot #1449

Closed fabianfrz closed 7 years ago

fabianfrz commented 7 years ago

After booting the DNS-Resolver seems to be not starting on 17.1.2 It runs after

fabianfrz commented 7 years ago

I checked the log, which says the following:

Mar  9 14:27:24 opn unbound: [24343:0] error: cannot chdir to directory:  (No such file or directory)

So it may be a race condition in the boot process.

fabianfrz commented 7 years ago

Now after booting, this one shows up in the log:

opnsense: /usr/local/etc/rc.bootup: The command 'chroot -u unbound -g unbound / '/usr/local/sbin/unbound-anchor' -a '/var/unbound/root.key'' returned exit code '1', the output was ''
fichtner commented 7 years ago

Can you give more context than the error itself? There is some kind of ordering issue, but I have no idea where and cannot reproduce.

fabianfrz commented 7 years ago

It happens when the device is booting. The service can be started later without issues. I think this is because the service is started before the configuration is generated

fichtner commented 7 years ago

Yes, but with some more log messages we can find the call spot. ;)

fabianfrz commented 7 years ago

sure:

Mar 11 17:27:42     opnsense: /index.php: Successful login for user 'root' from: 10.230.231.9
Mar 11 17:26:46     kernel:
Mar 11 17:26:46     kernel:
Mar 11 17:26:45     kernel:
Mar 11 17:26:33     squid[36296]: Squid Parent: (squid-1) process 36475 started
Mar 11 17:26:33     squid[36296]: Squid Parent: will start 1 kids
Mar 11 17:26:23     squid: No valid signing SSL certificate configured for HTTPS_port 127.0.0.1:3129
Mar 11 17:26:13     ftp-proxy[23677]: listening on 127.0.0.1 port 8021
Mar 11 17:26:08     configd.py: generate template container OPNsense/Syslog
Mar 11 17:26:08     configd.py: generate template container OPNsense/Sample/sub2
Mar 11 17:26:08     configd.py: generate template container OPNsense/Sample/sub1
Mar 11 17:26:07     configd.py: generate template container OPNsense/Sample
Mar 11 17:26:06     configd.py: generate template container OPNsense/Quagga
Mar 11 17:26:03     configd.py: generate template container OPNsense/Proxy
Mar 11 17:26:01     configd.py: generate template container OPNsense/Netflow
Mar 11 17:26:01     configd.py: generate template container OPNsense/Macros
Mar 11 17:26:00     configd.py: generate template container OPNsense/IPFW
Mar 11 17:25:58     configd.py: generate template container OPNsense/IDS
Mar 11 17:25:57     configd.py: generate template container OPNsense/FtpProxy
Mar 11 17:25:57     configd.py: generate template container OPNsense/Cron
Mar 11 17:25:56     configd.py: generate template container OPNsense/Captiveportal
Mar 11 17:25:55     configd.py: generate template container OPNsense/Auth
Mar 11 17:25:54     sshlockout[12808]: sshlockout/webConfigurator v3.0 starting up
Mar 11 17:25:54     configd.py: [5572e883-33aa-459b-850a-6723cdeb055a] generate template *
Mar 11 17:25:54     kernel: done.
Mar 11 17:25:52     root: /etc/rc.d/hostid: WARNING: hostid: unable to figure out a UUID from DMI data, generating a new one
Mar 11 17:25:51     configd.py: generate template container OPNsense/Syslog
Mar 11 17:25:50     kernel: done.
Mar 11 17:25:50     configd.py: [2659fcf9-b91b-4810-bed1-6a5b52d205f1] generate template OPNsense/Syslog
Mar 11 17:25:47     kernel: ...done.
Mar 11 17:25:47     kernel: .
Mar 11 17:25:46     kernel: .
Mar 11 17:25:44     sshd[46582]: Server listening on 0.0.0.0 port 22.
Mar 11 17:25:44     sshd[46582]: Server listening on :: port 22.
Mar 11 17:25:44     kernel: done.
Mar 11 17:25:43     kernel: done.
Mar 11 17:25:42     kernel: done.
Mar 11 17:25:36     opnsense: /usr/local/etc/rc.bootup: The command 'chroot -u unbound -g unbound / '/usr/local/sbin/unbound-anchor' -a '/var/unbound/root.key'' returned exit code '1', the output was ''
Mar 11 17:25:36     kernel: done.
Mar 11 17:25:36     kernel: done.
Mar 11 17:25:36     opnsense: /usr/local/etc/rc.bootup: ROUTING: setting IPv4 default route to x.x.x.x
Mar 11 17:25:35     lighttpd[94344]: (log.c.217) server started
Mar 11 17:25:35     kernel: done.
Mar 11 17:25:35     kernel: .done.
Mar 11 17:25:35     kernel: ..
Mar 11 17:25:35     kernel: .
Mar 11 17:25:33     kernel: pflog0: promiscuous mode enabled
Mar 11 17:25:32     kernel: done.
Mar 11 17:25:32     opnsense: /usr/local/etc/rc.bootup: Resyncing OpenVPN instances.
Mar 11 17:25:32     kernel: ath0_wlan1: ieee80211_new_state_locked: pending INIT -> SCAN transition lost
fichtner commented 7 years ago

I am guessing the /usr/local/sbin/unbound-anchor invoke changed return values, at least it doesn't look like it's not working. Some cleanups are incoming, but they shouldn't "fix" the "problem". Step by step...

fichtner commented 7 years ago

Works for both /var MFS on and off after reboot from here. What else could we try?

fabianfrz commented 7 years ago

Mine is an APU 1d4 with an SD card for the OS. It may happen because of slow memory

fichtner commented 7 years ago

So /var MFS? If the file system / mkdir returns from creating /var/unbound it should make sure it's available...

fabianfrz commented 7 years ago

This is the setting on this device: spectacle ik2556

Mounts:

# mount
/dev/da0s1a on / (ufs, local, soft-updates)
devfs on /dev (devfs, local, multilabel)
tmpfs on /tmp (tmpfs, local)
tmpfs on /var (tmpfs, local)
devfs on /var/dhcpd/dev (devfs, local, multilabel)

Devices

camcontrol devlist
<Multiple Card  Reader 1.00>       at scbus6 target 0 lun 0 (da0,pass0)

I hope any of this configuration elements can help. For now I will disable this tmpfs so the race condition will not prevent it from booting.

fichtner commented 7 years ago

@fabianfrz Can we close this? It looks like a local issue that is impossible to address...

fabianfrz commented 7 years ago

@fichtner yes, I think I can reopen it, if it appears again