oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
251 stars 39 forks source link

chronyd slow to get going #5192

Closed rcgoodfellow closed 3 months ago

rcgoodfellow commented 8 months ago

When doing test runs, I've noticed that chronyd is slow to get going.

From a boundary NTP zone I see the following.

root@oxz_ntp_9219fa61:~# svcprop ntp | grep config
config/file astring /etc/inet/chrony.conf
config/allow astring fd00:1122:3344:100::/56
config/boundary boolean true
config/server astring time.cloudflare.com

And when I try to resolve cloudflare I see this

root@oxz_ntp_9219fa61:~# host time.cloudflare.com
time.cloudflare.com has address 162.159.200.123
time.cloudflare.com has address 162.159.200.1
time.cloudflare.com has IPv6 address 2606:4700:f1::1
time.cloudflare.com has IPv6 address 2606:4700:f1::123

but chrony seems stuck here for quite a long time, several minutes.

root@oxz_ntp_9219fa61:~# chronyc tracking
Reference ID    : 7F7F0101 ()
Stratum         : 10
Ref time (UTC)  : Mon Mar 04 18:44:32 2024
System time     : 0.120790944 seconds fast of NTP time
Last offset     : +0.261880517 seconds
RMS offset      : 0.261880517 seconds
Frequency       : 0.000 ppm slow
Residual freq   : +0.000 ppm
Skew            : 0.000 ppm
Root delay      : 0.000000000 seconds
Root dispersion : 0.000000000 seconds
Update interval : 60.8 seconds

eventually we do get to a tracking state that looks like this

root@oxz_ntp_9219fa61:~# chronyc tracking
Reference ID    : A29FC801 (time.cloudflare.com)
Stratum         : 4
Ref time (UTC)  : Mon Mar 04 18:45:45 2024
System time     : 0.000000091 seconds slow of NTP time
Last offset     : +0.003425426 seconds
RMS offset      : 0.248444051 seconds
Frequency       : 49.183 ppm fast
Residual freq   : +0.000 ppm
Skew            : 797.217 ppm
Root delay      : 0.051692925 seconds
Root dispersion : 0.053474780 seconds
Update interval : 64.1 seconds
Leap status     : Normal

but it takes several minutes to get there after all the network conditions that would allow this to work are seemingly in place.

This is readily reproducible in the a4x2 testbed.

rcgoodfellow commented 4 months ago

In addition to being slow off the line to get to tracking for boundary servers, I'm now seeing non-boundary servers take forever to converge.

This is from a non-boundary server in a4x2. The slow NPT time is converging at a practically infinitesimal rate. Using chronyc makestep can work around this, but it requires manual intervention which is clearly not a reasonable solution.

root@oxz_ntp_1dd95272:~# chronyc tracking
Reference ID    : 6AC472ED (fd00:1122:3344:102::c)
Stratum         : 5
Ref time (UTC)  : Thu Jul 04 19:52:56 2024
System time     : 19.090559006 seconds slow of NTP time
Last offset     : -0.000066614 seconds
RMS offset      : 0.000078137 seconds
Frequency       : 61.531 ppm fast
Residual freq   : -0.001 ppm
Skew            : 0.221 ppm
Root delay      : 0.045872819 seconds
Root dispersion : 0.001213847 seconds
Update interval : 16.1 seconds
Leap status     : Normal
root@oxz_ntp_1dd95272:~# chronyc tracking
Reference ID    : 6AC472ED (fd00:1122:3344:102::c)
Stratum         : 5
Ref time (UTC)  : Thu Jul 04 19:54:49 2024
System time     : 18.806512833 seconds slow of NTP time
Last offset     : -0.000120590 seconds
RMS offset      : 0.000066439 seconds
Frequency       : 61.164 ppm fast
Residual freq   : -0.427 ppm
Skew            : 0.925 ppm
Root delay      : 0.045762327 seconds
Root dispersion : 0.001317421 seconds
Update interval : 16.2 seconds
Leap status     : Normal
root@oxz_ntp_1dd95272:~# chronyc tracking
Reference ID    : 6AC472ED (fd00:1122:3344:102::c)
Stratum         : 5
Ref time (UTC)  : Thu Jul 04 19:55:54 2024
System time     : 18.625116348 seconds slow of NTP time
Last offset     : +0.000068295 seconds
RMS offset      : 0.000073279 seconds
Frequency       : 61.092 ppm fast
Residual freq   : +0.090 ppm
Skew            : 0.815 ppm
Root delay      : 0.045762327 seconds
Root dispersion : 0.001361975 seconds
Update interval : 16.2 seconds
Leap status     : Normal