pi-hole / FTL

The Pi-hole FTL engine
https://pi-hole.net
Other
1.34k stars 187 forks source link

Add NTP-server/client implementation #1977

Closed DL6ER closed 1 day ago

DL6ER commented 1 month ago

What does this implement/fix?

Pi-hole's FTL features a full-blown DHCP server that may be used in cases where changing the DNS server of router's internal DHCP servers isn't possible or where greater flexibility is required. The DHCP server is primarily responsible for assigning IP addresses to devices on a network. It can also provide additional configuration information to these devices, such as the addresses of NTP (Network Time Protocol) servers.

NTP is used to synchronize the clocks of computers over a network. Accurate timekeeping is crucial for most modern features.

Why should FTL also offer NTP service:

  1. Simplicity: By providing both DHCP and NTP services, FTL can handle both IP address allocation and time synchronization. This resembles the functionality found in many routers that offer DHCP service.

  2. Efficiency: When a device connects to the network, it can query the Pi-hole for time synchronization making this a very fast (sub-millisecond) synchronization compared to each device individually using online services which are several milliseconds away.

  3. Consistency: By providing NTP services, the DHCP server can ensure that all devices on the network are using the same time source. This can help to prevent inconsistencies. (even when this shouldn't be an issue, usually)


Related issue or feature (if applicable): N/A

Pull request in docs with documentation (if applicable): N/A


By submitting this pull request, I confirm the following:

  1. I have read and understood the contributors guide, as well as this entire template. I understand which branch to base my commits and Pull Requests against.
  2. I have commented my proposed changes within the code.
  3. I am willing to help maintain this change if there are issues with it later.
  4. It is compatible with the EUPL 1.2 license
  5. I have squashed any insignificant commits. (git rebase)

Checklist:

DL6ER commented 1 month ago

Note that the failed spellcheck will be fixed by https://github.com/pi-hole/FTL/pull/1962

github-actions[bot] commented 1 month ago

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] commented 1 month ago

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] commented 1 month ago

Conflicts have been resolved.

DL6ER commented 1 month ago

Planning to add actual clock setting capability, converting to draft

DL6ER commented 4 weeks ago

Ready for review

DL6ER commented 4 weeks ago

is the NTP function only enabled when FTL's DHCP server is enabled or is it independent?

NTP service is always offered given no other NTP server is running on the same host (and port 123 is already taken)

In case FTL is the DHCP server do you think it should announce itself as NTP server via DHCP option 42?

Absolutely, this is implemented in this PR as well :slightly_smiling_face:

image

yubiuser commented 3 weeks ago

Where does the server part get its time from? Is it just using the hosts current time? I did not find a trace where FTL is querying a public NTP server.

DL6ER commented 3 weeks ago

I did just add a new debug option debug.ntp with output in many places - everything of relevance should be logged.


Where does the server part get its time from? Is it just using the hosts current time? I did not find a trace where FTL is querying a public NTP server.

This implementation simply asks the kernel for time and sends this. We are not relaying anything upstream or something - IMO this would defeat the purpose of a local NTP server. We could discuss about extending this feature either now or later to also query and set the time automatically. So far, this requires user interaction. We'd also have to add capability CAP_SYS_TIME for this to work without root / sudo.

yubiuser commented 3 weeks ago

Maybe we should offer an option for the server to poll an upstream NTP server once in a while and distribute this time to its own clients? This is how commercial routers (e.g. FritzBox) do it as well.

DL6ER commented 3 weeks ago

The last commit adds Real Time Clock (RTC) synchronization based on our NTP time synchronization. As we use the standard kernel interface for RTCs, every RTC supported by Linux should be supported out of the box.

Our particular implementation has been tested on these systems:

More importantly: It did work on all systems above, we have no examples to prove the opposite as of now.


Note: Our synchronization does not "announce" the fact that we are NTP or RTC synchronizing to systemd, leading to the observation that timedatectl thinks they are unsynced, cfm.

$ timedatectl 
               Local time: Sun 2024-06-08 21:08:33 CEST
           Universal time: Sun 2024-06-08 19:08:33 UTC
                 RTC time: Sun 2024-06-08 19:08:32
                Time zone: Europe/Berlin (CEST, +0200)
System clock synchronized: no
              NTP service: n/a
          RTC in local TZ: no

Both System clock synchronized and NTP service are booleans set in systemd through DBus. Adding a (static) dbus library and (possibly) a ton of code for merely setting these two boolean values seems disproportional to me right now. There are no drawbacks from not setting them.

yubiuser commented 3 weeks ago

Works here too

2024-06-09 22:17:50.020 CEST [147403/T147405] DEBUG_NTP: Gradually adjusting system time by 1951 us
2024-06-09 22:17:51.000 CEST [147403/T147405] DEBUG_NTP: Successfully opened RTC at "/dev/rtc0"
2024-06-09 22:17:51.000 CEST [147403/T147405] INFO: Current RTC time is 2024-06-09 20:17:51
2024-06-09 22:17:51.001 CEST [147403/T147405] DEBUG_NTP: RTC time is already correct

When I try to get the time from another device there is still something not quite right, at least it looks like this compared to the response I get from the router.

chrko@ThinkPad-X230:~$ ntpdate -dvq fritz.box
 9 Jun 22:14:08 ntpdate[18298]: ntpdate 4.2.8p15@1.3728-o Wed Feb 16 17:13:02 UTC 2022 (1)
Looking for host fritz.box and service ntp
10.0.1.1 reversed to fritz.box
host found : fritz.box
transmit(10.0.1.1)
receive(10.0.1.1)

server 10.0.1.1, port 123
stratum 3, precision -20, leap 00, trust 000
refid [195.71.234.252], root delay 0.043304, root dispersion 0.024765
reference time:      ea1083ff.e333bba6  Sun, Jun  9 2024 21:41:51.887
originate timestamp: ea108b90.9ca57946  Sun, Jun  9 2024 22:14:08.611
transmit timestamp:  ea108b90.9d387991  Sun, Jun  9 2024 22:14:08.614
delay 0.02623, dispersion 0.00000, offset -0.002854

 9 Jun 22:14:08 ntpdate[18298]: adjust time server 10.0.1.1 offset -0.002854 sec
chrko@ThinkPad-X230:~$ ntpdate -dvq nanopi.lan
 9 Jun 22:18:06 ntpdate[18448]: ntpdate 4.2.8p15@1.3728-o Wed Feb 16 17:13:02 UTC 2022 (1)
Looking for host nanopi.lan and service ntp
10.0.1.24 reversed to pi.hole
host found : pi.hole
transmit(10.0.1.24)
receive(10.0.1.24)
10.0.1.24: Server dropped: server has gone too long without sync

server 10.0.1.24, port 123
stratum 2, precision -20, leap 00, trust 000
refid [76.79.67.76], root delay 0.000000, root dispersion 0.000000
reference time:      (no time)
originate timestamp: ea108c7e.99147f13  Sun, Jun  9 2024 22:18:06.597
transmit timestamp:  ea108c7e.93fb4296  Sun, Jun  9 2024 22:18:06.578
delay 0.02690, dispersion 0.00000, offset +0.000660

 9 Jun 22:18:06 ntpdate[18448]: no server suitable for synchronization found

The log shows the query, but at least ntpdate seems to miss something within the response.

yubiuser commented 3 weeks ago

Those are the differences within the responses. See Root Dispersion, Root Delay and Reference Timestamp.

Pi-hole

Network Time Protocol (NTP Version 4, server)
    Flags: 0x24, Leap Indicator: no warning, Version number: NTP Version 4, Mode: server
    [Request In: 16]
    [Delta Time: 0.018438944 seconds]
    Peer Clock Stratum: secondary reference (2)
    Peer Polling Interval: invalid (3)
    Peer Clock Precision: 0,000001 seconds
    Root Delay: 0,000000 seconds
    Root Dispersion: 0,000000 seconds
    Reference ID: 76.79.67.76
    Reference Timestamp: (0)Jan  1, 1970 00:00:00.000000000 UTC
    Origin Timestamp: Jun  9, 2024 20:24:39.984562800 UTC
    Receive Timestamp: Jun  9, 2024 20:24:39.985837999 UTC
    Transmit Timestamp: Jun  9, 2024 20:24:40.002859999 UTC

FrizBox

Network Time Protocol (NTP Version 4, server)
    Flags: 0x24, Leap Indicator: no warning, Version number: NTP Version 4, Mode: server
    [Request In: 104]
    [Delta Time: 0.000663160 seconds]
    Peer Clock Stratum: secondary reference (3)
    Peer Polling Interval: invalid (3)
    Peer Clock Precision: 0,000001 seconds
    Root Delay: 0,043304 seconds
    Root Dispersion: 0,026245 seconds
    Reference ID: 195.71.234.252
    Reference Timestamp: Jun  9, 2024 19:41:51.887508132 UTC
    Origin Timestamp: Jun  9, 2024 20:24:46.886274728 UTC
    Receive Timestamp: Jun  9, 2024 20:24:46.884218543 UTC
    Transmit Timestamp: Jun  9, 2024 20:24:46.884551176 UTC
DL6ER commented 3 weeks ago

The reason for ntpdate complaining in your test is:

10.0.1.24: Server dropped: server has gone too long without sync
[...]
 9 Jun 22:18:06 ntpdate[18448]: no server suitable for synchronization found

because of

Reference Timestamp: (0)Jan  1, 1970 00:00:00.000000000 UTC

which is correct according to RFC 5905 for stateless servers.


So let's make FTL a full-blown stateful NTP server because we can actually serve

Root Dispersion, Root Delay and Reference Timestamp.

once at least one upstream NTP synchronization has happened. ntpdate will accept time servers which have been sync'ed at least once within the past 24 hours.

yubiuser commented 2 weeks ago

Thanks. Working now.

2 Jun 08:46:35 ntpdate[8069]: ntpdate 4.2.8p15@1.3728-o Wed Feb 16 17:13:02 UTC 2022 (1)
Looking for host nanopi.lan and service ntp
10.0.1.24 reversed to pi.hole
host found : pi.hole
transmit(10.0.1.24)
receive(10.0.1.24)

server 10.0.1.24, port 123
stratum 2, precision -20, leap 00, trust 000
refid [76.79.67.76], root delay 0.001251, root dispersion 0.001205
reference time:      ea13c24c.558c03a5  Wed, Jun 12 2024  8:44:28.334
originate timestamp: ea13c2cb.56358298  Wed, Jun 12 2024  8:46:35.336
transmit timestamp:  ea13c2cb.513c83bf  Wed, Jun 12 2024  8:46:35.317
delay 0.02644, dispersion 0.00000, offset +0.004498

12 Jun 08:46:35 ntpdate[8069]: adjust time server 10.0.1.24 offset +0.004498 sec
yubiuser commented 2 weeks ago

Should a ntp error be logged to the message table as well?

2024-06-12 08:54:29.385 CEST [185948M] INFO: Using SSL/TLS certificate file /etc/pihole/tls.pem
2024-06-12 08:54:29.387 CEST [185948/T185951] ERROR: Cannot resolve NTP server address
2024-06-12 08:54:29.388 CEST [185948M] INFO: Restored 1 API session from the database
DL6ER commented 2 weeks ago

The server can create 4 errors and 5 warnings, the client can create 10 errors and 6 warnings... Should we add them all to the diagnosis system - keeping only the most recent of each to avoid filling the diagnosis system with the ever same message?

yubiuser commented 2 weeks ago

I think we should to it like we do it for all the other entries in the table: don't add duplicates - so yes, only the most recent ones. We might also not add all errors/warnings. Only those that are "more relevant" to appear in the web interface. So far, only a handful of selected errors are added to the message table. If you think this should change in general this could also become a separate PR.

rdwebdesign commented 2 weeks ago

Only those that are "more relevant" to appear in the web interface.

Yes, I think we should be consistent. If we only send to the table a few selected warnings related to other modules, we can send only the most important ones related to NTP.

All warnings and errors will still be sent to FTL.log, right?

DL6ER commented 2 weeks ago

Added

Screenshot from 2024-06-13 06-26-02

All warnings and errors will still be sent to FTL.log, right?

Yes.

yubiuser commented 2 weeks ago

When using pihole-FTL ntp there is only partially logging to FTL.log (even with debug ntp). With no additional argument the individual queries are logged, but not the summary. When using a different upstream (with pihole-FTL ntp server) nothing is loggend at all (might be desired).

Additionally with the --update flag, opening and setting the RTC is not logged at all (with and without [server])

DL6ER commented 2 weeks ago

Not exactly sure what you mean, let me try a few things here:

$ sudo pihole-FTL --config debug.ntp false &&  pihole-FTL ntp
false
........
Received 8/8 valid NTP replies from 127.0.0.1
Average time offset: (-2.264977e-06 +/- 3.150939e-05 s)
Average round-trip delay: (3.192425e-04 +/- 1.041611e-04 s)
Trimmed mean time offset: 7.969993e-06 s (excluded 1 outliers)
Trimmed mean round-trip delay: 2.851486e-04 s (excluded 1 outliers)

-> no logging to FTL.log

$ sudo pihole-FTL --config debug.ntp true && pihole-FTL ntp
true
Server reference time: ea18efb7.eae8aa90 = 2024-06-16 06:59:35.917612 CEST
Current time at client: ea18f04d.48c1ca3a = 2024-06-16 07:02:05.284206 CEST
Current time at server: ea18f04d.48b9ecf6 = 2024-06-16 07:02:05.284086 CEST
Time offset: 6.771088e-05 s
Round-trip delay: 3.747940e-04 s
Root delay: 1.429901e-01 s
Root dispersion: 1.373291e-04 s
[...]
.
Received 7/8 valid NTP replies from 127.0.0.1
Average time offset: (6.675720e-06 +/- 2.751231e-05 s)
Average round-trip delay: (2.598081e-04 +/- 6.134486e-05 s)
Trimmed mean time offset: -3.496806e-06 s (excluded 2 outliers)
Trimmed mean round-trip delay: 2.406438e-04 s (excluded 2 outliers)

-> everything that is logged to FTL.log is the server thread's debug messages, the client shouldn't log anything. That's also why there's no summary in the log (the server does not see a summary):

2024-06-16 07:02:06.287 CEST [3690425/T3690430] DEBUG_NTP: Received NTP request from 127.0.0.1:47308
2024-06-16 07:02:06.290 CEST [3691281/F3690425] DEBUG_NTP: Reference Timestamp: ea18efb7.eae8aa90 = 2024-06-16 06:59:35.917612 CEST
2024-06-16 07:02:06.291 CEST [3691281/F3690425] DEBUG_NTP: Origin Timestamp: ea18f04e.49a3c211 = 2024-06-16 07:02:06.287654 CEST
2024-06-16 07:02:06.291 CEST [3691281/F3690425] DEBUG_NTP: Receive Timestamp: ea18f04e.49a9da59 = 2024-06-16 07:02:06.287747 CEST
2024-06-16 07:02:06.291 CEST [3691281/F3690425] DEBUG_NTP: Transmit Timestamp: ea18f04e.4a88effe = 2024-06-16 07:02:06.291151 CEST
[...]

Mind that the shown PID differs from the usual FTL process as NTP workers fork to achieve parallelism and avoid blocking if multiple queries arrive at the same time.

The reason that everything that is being logged is the server also explains why you haven't seen any RTC setting. When the client is run from within your FTL process (to automatically sync the clock in the background), everything will be logged to FTL.log as there is no terminal to print stuff onto.

DL6ER commented 2 weeks ago

NB: I was now running three days without new/ntp and the first sync after switching back already had to correct my local time (incl. RTC) by two seconds underlining the usefulness of this change for Pi-hole matters.

github-actions[bot] commented 6 days ago

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] commented 6 days ago

Conflicts have been resolved.

yubiuser commented 5 days ago

everything that is logged to FTL.log is the server thread's debug messages, the client shouldn't log anything. That's also why there's no summary in the log (the server does not see a summary):

Ok, this explains my questions about the logging.

yubiuser commented 5 days ago

Regarding your last commit: should we change the startup order to check first for the correct time and then import the queries - this could maybe avoid a restart if the time was off to far.


024-06-25 22:54:07.826 CEST [349294M] INFO: Imported 8102 queries from the long-term database
2024-06-25 22:54:07.826 CEST [349294M] INFO:  -> Total DNS queries: 8102
2024-06-25 22:54:07.826 CEST [349294M] INFO:  -> Cached DNS queries: 3983
2024-06-25 22:54:07.826 CEST [349294M] INFO:  -> Forwarded DNS queries: 2601
2024-06-25 22:54:07.826 CEST [349294M] INFO:  -> Blocked DNS queries: 1406
2024-06-25 22:54:07.826 CEST [349294M] INFO:  -> Unknown DNS queries: 31
2024-06-25 22:54:07.826 CEST [349294M] INFO:  -> Unique domains: 959
2024-06-25 22:54:07.826 CEST [349294M] INFO:  -> Unique clients: 11
2024-06-25 22:54:07.826 CEST [349294M] INFO:  -> DNS cache records: 1886
2024-06-25 22:54:07.827 CEST [349294M] INFO:  -> Known forward destinations: 6
2024-06-25 22:54:07.827 CEST [349294/T349296] INFO: NTP server listening on 0.0.0.0:123 (IPv4)
2024-06-25 22:54:07.827 CEST [349294/T349297] INFO: NTP server listening on :::123 (IPv6)
2024-06-25 22:54:07.828 CEST [349294M] INFO: FTL is running as user pihole (UID 999)
2024-06-25 22:54:07.828 CEST [349294M] INFO: Reading certificate from /etc/pihole/tls.pem ...
2024-06-25 22:54:07.829 CEST [349294M] INFO: Using SSL/TLS certificate file /etc/pihole/tls.pem
2024-06-25 22:54:07.831 CEST [349294M] INFO: Restored 1 API session from the database
2024-06-25 22:54:07.836 CEST [349294M] INFO: Blocking status is enabled
2024-06-25 22:54:08.000 CEST [349294/T349299] INFO: Compiled 6 allow and 1 deny regex for 11 clients in 68.6 msec
2024-06-25 22:54:12.450 CEST [349294/T349298] INFO: Received 8/8 valid NTP replies from de.pool.ntp.org
2024-06-25 22:54:12.451 CEST [349294/T349298] INFO: Time offset: 1.657626e+01 ms (excluded 0 outliers)
2024-06-25 22:54:12.451 CEST [349294/T349298] INFO: Round-trip delay: 5.545336e+01 ms (excluded 0 outliers)
2024-06-25 22:54:13.000 CEST [349294/T349298] INFO: Current RTC time is 2024-06-25 20:54:13
DL6ER commented 5 days ago

We should check for sufficient permissions early and fail gracefully.

Ah, you mean concerning the database, yes. The most recent commits made this:

$ pihole-FTL ntp de.pool.ntp.org --update
........
Received 8/8 valid NTP replies from de.pool.ntp.org
Time offset: 9.241263e+00 ms (excluded 2 outliers)
Round-trip delay: 5.255381e+00 ms (excluded 2 outliers)
Error NTP client: Failed to adjust time during NTP sync: Insufficient permissions
Cannot open database in read-write mode

Regarding your last commit: should we change the startup order to check first for the correct time and then import the queries - this could maybe avoid a restart if the time was off to far.

The idea proposed here is to start the resolver and import queries only once the first NTP synchronization finished. NTP synchronization can take everything between 4 seconds (minimum when everything works) to 40 seconds (when all eight attempts time out after five seconds each). I implemented this now.

Before NTP synchronization finished, the web interface will look like before NTP sync and thereafter after NTP sync and then (a long time afterwards!) image

The reason is that all the main graphs aren't updated very often. We need to make a change to the web interface to force reloading the large graphs if the total queries number jumps significantly. My suggestions would be a jump of a factor 2 but at least 100 in absolute numbers. To avoid reloading often once we go from 2 to 4 to 8, ... queries.

yubiuser commented 4 days ago

Ah, you mean concerning the database, yes. The most recent commits made this:

My idea was more about check_capability(CAP_SYS_TIME) and do not perform any NTP queries if the permission are not granted at all.


Can you add the detail log of what queries are imported after the import again? Currently it is not much of a use to see details of zero imported queries and no details after the import.

2024-06-26 22:23:31.412 CEST [362251M] INFO: Database successfully initialized
2024-06-26 22:23:31.446 CEST [362251M] INFO:  -> Total DNS queries: 0
2024-06-26 22:23:31.446 CEST [362251M] INFO:  -> Cached DNS queries: 0
2024-06-26 22:23:31.446 CEST [362251M] INFO:  -> Forwarded DNS queries: 0
2024-06-26 22:23:31.447 CEST [362251M] INFO:  -> Blocked DNS queries: 0
2024-06-26 22:23:31.447 CEST [362251M] INFO:  -> Unknown DNS queries: 0
2024-06-26 22:23:31.447 CEST [362251M] INFO:  -> Unique domains: 0
2024-06-26 22:23:31.447 CEST [362251M] INFO:  -> Unique clients: 0
2024-06-26 22:23:31.447 CEST [362251M] INFO:  -> DNS cache records: 0
2024-06-26 22:23:31.447 CEST [362251M] INFO:  -> Known forward destinations: 0
2024-06-26 22:23:31.448 CEST [362251/T362252] INFO: NTP server listening on 0.0.0.0:123 (IPv4)
2024-06-26 22:23:31.448 CEST [362251/T362253] INFO: NTP server listening on :::123 (IPv6)
.....
2024-06-26 22:23:35.676 CEST [362251/T362254] INFO: Received 8/8 valid NTP replies from de.pool.ntp.org
2024-06-26 22:23:35.677 CEST [362251/T362254] INFO: Time offset: 3.415857e-01 ms (excluded 1 outliers)
2024-06-26 22:23:35.677 CEST [362251/T362254] INFO: Round-trip delay: 2.570173e+01 ms (excluded 1 outliers)
2024-06-26 22:23:36.000 CEST [362251/T362254] INFO: Current RTC time is 2024-06-26 20:23:36
2024-06-26 22:23:36.756 CEST [362251/T362254] INFO: Imported 6918 queries from the on-disk database (it has 1242488 rows)
2024-06-26 22:23:36.756 CEST [362251/T362254] INFO: Parsing queries in database
2024-06-26 22:23:37.221 CEST [362251/T362254] INFO: Imported 6918 queries from the long-term database
2024-06-26 22:24:00.735 CEST [362251/T362255] INFO: Size of /etc/pihole/pihole-FTL.db is 83.38 MB, deleted 151 rows