failure on the first iterration

zoranbosnjak commented 3 days ago

Test program from the README example often fails during the first loop iterration. It looks like this is the unfortunate consequence of randomization during NtpAssociation init. The comment explains a reason for this (to avoid bursts). The problem is that it can generate false alarms during program startup, where I am actually checking the offsets. I am looking for a way to workaround it.

One obvious candidate is the the query_peers method, which currently returns plain float - pause time or zero in some cases. This makes it impossible for the user to check the success, based on the return value. Please consider changing return type to Optional[float] and return None in cases where the method did not make any progress. The handling on the call site would also require minor update.

Not sure if a better solution would be to restrict the randomness during init, such that it remains in the safe interval.

This is the example program with debugging enabled:

import logging

logging.basicConfig(format="%(asctime)s %(levelname)s %(message)s")
logging.getLogger().setLevel(logging.DEBUG)

# the rest of the example... from README

Results (sometimes) in:

$ python3 test.py
2024-09-16 14:30:47,294 INFO NTP association 82.69.171.134 initialized
2024-09-16 14:30:47,294 DEBUG 82.69.171.134 Scheduled at 1726489849.535652
2024-09-16 14:30:47,294 INFO NTP association 217.114.59.66 initialized
2024-09-16 14:30:47,294 DEBUG 217.114.59.66 Scheduled at 1726489851.562092
2024-09-16 14:30:47,294 DEBUG Created IPv4 socket
2024-09-16 14:30:47,294 DEBUG Query peers
2024-09-16 14:30:47,294 DEBUG No more peers to query for now
problem: No fit peers found
2024-09-16 14:30:49,538 DEBUG Query peers
2024-09-16 14:30:49,570 DEBUG 82.69.171.134 Got a packet
2024-09-16 14:30:49,570 DEBUG 82.69.171.134 Scheduled at 1726489878.9640217
2024-09-16 14:30:49,570 DEBUG 82.69.171.134 Update with offset=0.00430417 delay=0.0320477 dispersion=4.41557e-06 jitter=0
2024-09-16 14:30:49,570 DEBUG No more peers to query for now
2024-09-16 14:30:49,570 DEBUG Fit peers: [82.69.171.134]
2024-09-16 14:30:49,571 DEBUG Finding consensus, assuming 0 falsetickers
2024-09-16 14:30:49,571 DEBUG Truechimers: {82.69.171.134}
2024-09-16 14:30:49,571 DEBUG Survivors: [82.69.171.134]
2024-09-16 14:30:49,571 DEBUG offset=0.00430417 jitter=0 leap=0
0.004304170608520508

jsiembida commented 2 days ago

As such, "unsynchronized" is an expected state and the calling code should always account for it, especially at the beginning, when the NTP client latches on. But the randomization could easily be parametrized. And a couple of other things could be parametrized to tweak the NTP client logic, too.

Communication errors encountered during querying are an expected case and should trigger some back off logic in polling intervals (that's at least my understanding of RFC). That's why query_peers always returns a pause to the next poll, if it didn't then caller would need to bring in its own logic. I would keep this level of abstraction of query_peers because it is very convenient if you just want to follow the intended use case. But it could be broken down to allow for more granular control.

zoranbosnjak commented 1 day ago

It's true that communication errors are to be expected. But I don't think this reasoning is justified here. I this particular scenario, there was no real error, so the calling code should get clean status. At least I my usecase I would like to avoid false alarms as much as possible.

Anyway, additional parametrization is indeed an easy fix. See pull request.

jsiembida / python-ntp

failure on the first iterration #2