opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.33k stars 745 forks source link

ntpd doesn't start #4245

Closed danb35 closed 3 years ago

danb35 commented 4 years ago

Important notices Before you add a new report, we ask you kindly to acknowledge the following:

[X] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md

[X] I have searched the existing issues and I'm convinced that mine is new.

Describe the bug From my initial installation on OPNSense 20.1.8 to current on 20.7, ntpd hasn't started and stayed running for more than a few seconds. More detail in this forum thread.

It appears that on boot, ntpdate_sync_once.sh runs and successfully synchronizes the time, after which ntpd is started--as it should be. But after a very brief amount of time, ntpd is killed with signal 15.

To Reproduce Steps to reproduce the behavior: As above--boot the system and monitor ntpd status. It may start briefly, but won't stay that way.

Expected behavior ntpd starts and runs.

Screenshots image

Relevant log files /var/log/ntpd.log

Jul 31 09:16:51 opnsense ntpdate[12533]: Can't find host 0.pool.ntp.org: Name does not resolve (8)
Jul 31 09:16:54 opnsense ntpdate[12533]: Can't find host 2.pool.ntp.org: Name does not resolve (8)
Jul 31 09:16:56 opnsense ntpdate[12533]: no servers can be used, exiting
Jul 31 09:17:25 opnsense ntpdate[94626]: Can't find host 0.pool.ntp.org: Name does not resolve (8)
Jul 31 13:17:54 opnsense ntpdate[94626]: step time server 192.241.146.233 offset +14409.531949 sec
Jul 31 13:17:54 opnsense ntp[16779]: Successfully synced time after 2 attempts.
Jul 31 13:17:54 opnsense ntp[78028]: Starting NTP Daemon.
Jul 31 13:17:54 opnsense ntpd[48610]: ntpd 4.2.8p15@1.3728-o Tue Jul 28 02:25:36 UTC 2020 (1): Starting
Jul 31 13:17:54 opnsense ntpd[48610]: Command line: /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid
Jul 31 13:17:54 opnsense ntpd[48610]: ----------------------------------------------------
Jul 31 13:17:54 opnsense ntpd[48610]: ntp-4 is maintained by Network Time Foundation,
Jul 31 13:17:54 opnsense ntpd[48610]: Inc. (NTF), a non-profit 501(c)(3) public-benefit
Jul 31 13:17:54 opnsense ntpd[48610]: corporation.  Support and training for ntp-4 are
Jul 31 13:17:54 opnsense ntpd[48610]: available at https://www.nwtime.org/support
Jul 31 13:17:54 opnsense ntpd[48610]: ----------------------------------------------------
Jul 31 13:17:54 opnsense ntpd[55576]: proto: precision = 0.288 usec (-22)
Jul 31 13:17:54 opnsense ntpd[55576]: basedate set to 2020-07-16
Jul 31 13:17:54 opnsense ntpd[55576]: gps base set to 2020-07-19 (week 2115)
Jul 31 13:17:54 opnsense ntpd[55576]: restrict: 'monitor' cannot be disabled while 'limited' is enabled
Jul 31 13:17:54 opnsense ntpd[55576]: Listen and drop on 0 v6wildcard [::]:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listen and drop on 1 v4wildcard 0.0.0.0:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 2 igb2 [fe80::208:a2ff:fe0a:d506%3]:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 3 igb2 192.168.1.1:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 4 lo0 [::1]:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 5 lo0 127.0.0.1:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 6 igb2_vlan107 [fe80::208:a2ff:fe0a:d506%9]:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 7 igb2_vlan107 192.168.107.1:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listening on routing socket on fd #28 for interface updates
Jul 31 13:18:53 opnsense ntpd[55576]: ntpd exiting on signal 15 (Terminated)
Jul 31 13:18:53 opnsense ntpd[55576]: 162.159.200.1 local addr 192.168.1.1 -> <null>
Jul 31 13:18:53 opnsense ntpd[55576]: 23.131.160.7 local addr 192.168.1.1 -> <null>
Jul 31 13:18:53 opnsense ntpd[55576]: 2600:3c01::f03c:92ff:fe12:add4 local addr fe80::208:a2ff:fe0a:d506%3 -> <null>
Jul 31 13:18:53 opnsense ntpd[55576]: 90.187.99.165 local addr 192.168.1.1 -> <null> 

Additional context Add any other context about the problem here.

Environment

OPNsense 20.1.8 - 20.7 (amd64, OpenSSL). Intel® Celeron C2358 Netgate RCC-VE-2440

AdSchellevis commented 4 years ago

Due to responses like these https://forum.opnsense.org/index.php?topic=18253.msg84616#msg84616 I'm quite tempted to ignore the ticket completely.
Lot's of people try to help others in their spare time, which doesn't always mean that your question is on the top of everybody's list.

Using google a bit on "ntpd exiting on signal 15" shows some interesting topics on the subject, ntpd as a time server is quite sensitive about .... time... which is probably why it kills itself when time drifts too much. Often these things seem to happen when systems have an unstable rtc clock. We haven't seen similar issues ourselves on quality hardware.

Some resources about similar issues on other platforms:

Unfortunately I couldn't find documentation about this behaviour from ntpd itself, it could be unintended (or should have outputted more information), but it does look quite common.

Specifically about your hardware you can find the following (same area, don't know if it's the cause, but sounds similar):

I don't know if chrony handles rtc issues differently, it might conceal the underlaying issue(s), but there's also a chance that it treats those similar. It might be worth experimenting for people suffering from this issue, from a core perspective there's not much we can (and plan) to do in this case.

PrivatePuffin commented 4 years ago

Due to responses like these https://forum.opnsense.org/index.php?topic=18253.msg84616#msg84616 I'm quite tempted to ignore the ticket completely.

Thats a very unprofessional comment to make. The merrit of a github issue has nothing at all to do with the attitude of the one submitting the issue.

mimugmail commented 4 years ago

Quoting only one sentence makes it look like this, but it's not true when reading the rest of it :)

The error is sometimes hard to track when it's not reproducable and it needs time and more manpower. So the interim workaround can be trying chrony and working on the issue in parallel

mikahe commented 4 years ago

One lead for debug is why syslog-ng is not running/crashes. EDIT: Looks like two instances could be started (occasionally?) at boot: $ ps auxf | grep syslog root 33928 0.0 0.8 19304 7824 - I 20:18 0:00.00 /usr/local/sbin/syslog-ng -p /var/run/syslog-ng.pid root 39227 0.0 1.0 26444 9916 - Ss 20:18 0:00.16 /usr/local/sbin/syslog-ng -p /var/run/syslog-ng.pid root 39857 0.0 1.1 21080 11212 - S 20:18 0:00.10 /usr/local/bin/python3 /usr/local/opnsense/scripts/syslog/lockout_handler (python3.7) root 82382 0.0 0.3 1061316 3396 - Ss 20:18 0:00.08 /usr/local/sbin/syslogd -s -c -c -P /var/run/syslog.pid -p /var/run/legacy_log -S /var/run/legacy_logpriv -k -s -s -f /var/et root 84149 0.0 0.3 1067348 3268 - Is 20:18 0:00.00 dhclient: system.syslog (dhclient) $ cat /var/run/syslog-ng.pid 39227

psychogun commented 3 years ago

Do you have a working DNS?

Jul 31 09:16:51 opnsense ntpdate[12533]: Can't find host 0.pool.ntp.org: Name does not resolve (8)
Jul 31 09:16:54 opnsense ntpdate[12533]: Can't find host 2.pool.ntp.org: Name does not resolve (8)
Jul 31 09:16:56 opnsense ntpdate[12533]: no servers can be used, exiting
Jul 31 09:17:25 opnsense ntpdate[94626]: Can't find host 0.pool.ntp.org: Name does not resolve (8)
danb35 commented 3 years ago

The OPNsense box was working fine as DNS for the rest of my network. But I've long since given up on this and moved back to pfSense.

OPNsense-bot commented 3 years ago

This issue has been automatically timed-out (after 180 days of inactivity).

For more information about the policies for this repository, please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.

If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.

PSLLSP commented 1 year ago

It looks like DNS doesn't work, so NTP cannot resolve domain names to IP addresses and PC has wrong time. DNS doesn't work because PC has wrong time!! That is a deadlock... Time has to be fixed manually, that will restore DNS function and after that NTP will be able to fetch correct time. DNS doesn't work because signed domains needs correct time to use certificate to validate domains; when time is wrong, validation of DNS replies fails... Other workaround could be that one of NTP servers for initial time sync will be defined by IP address, like 162.159.200.123 (time.cloudflare.com)

Correct time is important these days, because when time is wrong, certificates are no valid. When time on PC is not correct, many thing break, it is not possible to update, validating DNS server could be broken, etc...

fichtner commented 1 year ago

DNS doesn't work because PC has wrong time!!

What?!

Time has to be fixed manually

I'd say DNS has to be fixed manually if it was broken. ;)

Other workaround could be that one of NTP servers for initial time sync will be defined by IP address

Workarounds are impractical for default use.

I'd say this easily falls under what is called system administration. CMOS battery is likely flaky if the time is not recorded correctly...

Cheers, Franco

PSLLSP commented 1 year ago

Try to set wrong time to your router and reboot. Check if time was corrected during boot. Check if DNS works when time is wrong... Similar issue was reported on IPfire running on Raspberry PI without RTC module, just few days ago... Unbound fails to validate DNS replies when time is not correct, it rejects all DNS replies as invalid...

fichtner commented 1 year ago

Try to set wrong time to your router and reboot.

Sorry, but why?

PSLLSP commented 1 year ago

Try to set wrong time to your router and reboot.

Sorry, but why?

Sorry, forget a about that...