siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.75k stars 541 forks source link

clock skew #7080

Closed uhthomas closed 1 year ago

uhthomas commented 1 year ago

Bug Report

Description

It looks like talos nodes aren't syncing their clocks properly. Ceph is complaining of clock skew and I can confirm this is the case on the hosts themselves. The difference between 2023-04-12 11:43:52.297503558 +0000 UTC and 2023-04-12 11:43:52.430898682 +0000 UTC is 133.395124ms(!). (playground)

The logs don't make much sense to be me given hosts are clearly happy to resolve pool.ntp.org.

Logs

❯ talosctl --talosconfig=unwind-talosconfig -n 10.0.0.101,10.0.0.102,10.0.0.103,10.0.0.104,10.0.0.105 time
NODE         NTP-SERVER     NODE-TIME                                 NTP-SERVER-TIME
10.0.0.101   pool.ntp.org   2023-04-12 11:43:52.324593578 +0000 UTC   2023-04-12 11:43:52.308599071 +0000 UTC
10.0.0.102   pool.ntp.org   2023-04-12 11:43:52.430898682 +0000 UTC   2023-04-12 11:43:52.303693959 +0000 UTC
10.0.0.103   pool.ntp.org   2023-04-12 11:43:52.317368811 +0000 UTC   2023-04-12 11:43:52.309190395 +0000 UTC
10.0.0.104   pool.ntp.org   2023-04-12 11:43:52.315811787 +0000 UTC   2023-04-12 11:43:52.307580784 +0000 UTC
10.0.0.105   pool.ntp.org   2023-04-12 11:43:52.297503558 +0000 UTC   2023-04-12 11:43:52.31071359 +0000 UTC
❯ talosctl --talosconfig=unwind-talosconfig -n 10.0.0.101,10.0.0.102,10.0.0.103,10.0.0.104,10.0.0.105 time -c time.cloudflare.com
NODE         NTP-SERVER            NODE-TIME                                 NTP-SERVER-TIME
10.0.0.101   time.cloudflare.com   2023-04-12 11:45:32.258062121 +0000 UTC   2023-04-12 11:45:32.249952965 +0000 UTC
10.0.0.102   time.cloudflare.com   2023-04-12 11:45:32.38238141 +0000 UTC    2023-04-12 11:45:32.245881097 +0000 UTC
10.0.0.103   time.cloudflare.com   2023-04-12 11:45:32.255538166 +0000 UTC   2023-04-12 11:45:32.249325211 +0000 UTC
10.0.0.104   time.cloudflare.com   2023-04-12 11:45:32.251933718 +0000 UTC   2023-04-12 11:45:32.245709705 +0000 UTC
10.0.0.105   time.cloudflare.com   2023-04-12 11:45:32.203954549 +0000 UTC   2023-04-12 11:45:32.249928454 +0000 UTC
❯ talosctl --talosconfig=unwind-talosconfig -n 10.0.0.101,10.0.0.102,10.0.0.103,10.0.0.104,10.0.0.105 dmesg | rg ntp
...
10.0.0.103: user: warning: [2023-04-04T02:23:38.159577789Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on 10.0.0.1:53: read udp 10.0.0.115:60090->10.0.0.1:53: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:23:44.411133789Z]: [talos] ntp query error with server "94.198.159.16" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:40583->94.198.159.16:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:23:54.632667789Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on 10.0.0.1:53: read udp 10.0.0.115:38939->10.0.0.1:53: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:24:00.883555789Z]: [talos] ntp query error with server "94.198.159.16" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:55803->94.198.159.16:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:24:11.105526789Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on 10.0.0.1:53: read udp 10.0.0.115:56557->10.0.0.1:53: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:24:17.356543789Z]: [talos] ntp query error with server "94.198.159.16" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:54137->94.198.159.16:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:24:27.579516789Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on 10.0.0.1:53: read udp 10.0.0.115:44193->10.0.0.1:53: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:24:33.831256789Z]: [talos] ntp query error with server "94.198.159.16" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:40435->94.198.159.16:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:24:44.053002789Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on 10.0.0.1:53: read udp 10.0.0.115:60993->10.0.0.1:53: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:24:50.304388789Z]: [talos] ntp query error with server "94.198.159.16" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:57126->94.198.159.16:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:25:00.526536789Z]: [talos] failed looking up "pool.ntp.org", ignored {"component": "controller-runtime", "controller": "time.SyncController", "error": "lookup pool.ntp.org on 10.0.0.1:53: read udp 10.0.0.115:36659->10.0.0.1:53: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:25:06.778034789Z]: [talos] ntp query error with server "94.198.159.16" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:60536->94.198.159.16:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:25:12.000035789Z]: [talos] ntp query error with server "85.199.214.98" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:42381->85.199.214.98:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:25:17.222003789Z]: [talos] ntp query error with server "132.226.210.133" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:53129->132.226.210.133:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:25:22.447889789Z]: [talos] ntp query error with server "134.0.16.1" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:53464->134.0.16.1:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:25:27.663226789Z]: [talos] ntp query error with server "185.57.191.229" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:37234->185.57.191.229:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:25:33.887672789Z]: [talos] ntp query error with server "94.198.159.16" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:33194->94.198.159.16:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:25:39.109809789Z]: [talos] ntp query error with server "134.0.16.1" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:36709->134.0.16.1:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:25:44.325194789Z]: [talos] ntp query error with server "185.57.191.229" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:53060->185.57.191.229:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:25:49.549244789Z]: [talos] ntp query error with server "85.199.214.98" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:37253->85.199.214.98:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:25:54.770688789Z]: [talos] ntp query error with server "132.226.210.133" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:52018->132.226.210.133:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:26:00.997293789Z]: [talos] ntp query error with server "94.198.159.16" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:51416->94.198.159.16:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:26:06.219350789Z]: [talos] ntp query error with server "85.199.214.98" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:46763->85.199.214.98:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:26:11.440987789Z]: [talos] ntp query error with server "132.226.210.133" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:38546->132.226.210.133:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:26:16.667220789Z]: [talos] ntp query error with server "134.0.16.1" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:40123->134.0.16.1:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:26:21.882508789Z]: [talos] ntp query error with server "185.57.191.229" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:48657->185.57.191.229:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:26:28.106581789Z]: [talos] ntp query error with server "94.198.159.16" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:43064->94.198.159.16:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:26:33.328422789Z]: [talos] ntp query error with server "134.0.16.1" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:43029->134.0.16.1:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:26:38.543216789Z]: [talos] ntp query error with server "185.57.191.229" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:41414->185.57.191.229:123: i/o timeout"}
10.0.0.103: user: warning: [2023-04-04T02:30:27.940963789Z]: [talos] ntp query error with server "85.199.214.98" {"component": "controller-runtime", "controller": "time.SyncController", "error": "read udp 10.0.0.115:59470->85.199.214.98:123: i/o timeout"}

Environment

❯ talosctl --talosconfig=unwind-talosconfig -n 10.0.0.101,10.0.0.102,10.0.0.103,10.0.0.104,10.0.0.105 version
Client:
        Tag:         v1.3.6
        SHA:         undefined
        Built:
        Go version:  go1.20.2
        OS/Arch:     linux/amd64
Server:
        NODE:        10.0.0.101
        Tag:         v1.3.6
        SHA:         5b6176d8
        Built:
        Go version:  go1.19.7
        OS/Arch:     linux/amd64
        Enabled:     RBAC
        NODE:        10.0.0.104
        Tag:         v1.4.0-alpha.4
        SHA:         b8980817
        Built:
        Go version:  go1.20.2
        OS/Arch:     linux/amd64
        Enabled:     RBAC
        NODE:        10.0.0.103
        Tag:         v1.3.6
        SHA:         5b6176d8
        Built:
        Go version:  go1.19.7
        OS/Arch:     linux/amd64
        Enabled:     RBAC
        NODE:        10.0.0.102
        Tag:         v1.3.6
        SHA:         5b6176d8
        Built:
        Go version:  go1.19.7
        OS/Arch:     linux/amd64
        Enabled:     RBAC
        NODE:        10.0.0.105
        Tag:         v1.3.6
        SHA:         5b6176d8
        Built:
        Go version:  go1.19.7
        OS/Arch:     linux/amd64
        Enabled:     RBAC
❯ kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.26.3
Kustomize Version: v4.5.7
Server Version: v1.26.2
smira commented 1 year ago

As you can see from the log you posted, there's a problem in your network blocking NTP requests (?), or all NTP servers in the pool are down (less likely). Fix that, and the time will be in sync.

uhthomas commented 1 year ago

As you can see from the log you posted, there's a problem in your network blocking NTP requests (?), or all NTP servers in the pool are down (less likely). Fix that, and the time will be in sync.

How can this be true if talosctl time shows a valid time from the ntp server?

uhthomas commented 1 year ago

Strangely, this only appears to affect hosts 101,102 and 103. 104 and 105 are happy enough.

❯ talosctl --talosconfig=unwind-talosconfig -n 10.0.0.104,10.0.0.105 dmesg | rg ntp
# nothing
dubyatp commented 1 year ago

I too am noticing clock skew on Talos, to the point where Ceph is now notifying me about clock drift on 2/3 of my mons. I'm on 1.3.0 at the moment (will be upgrading soon), but this only started occurring in the past few weeks or so.

By any chance, are you using Proxmox as the hypervisor to your nodes? I recently upgraded all my Proxmox nodes right before these time drift issues occurred.

NODE NTP-SERVER NODE-TIME NTP-SERVER-TIME 10.251.23.6 time.cloudflare.com 2023-04-29 14:35:57.130240576 +0000 UTC 2023-04-29 14:35:57.130342456 +0000 UTC 10.251.23.12 time.cloudflare.com 2023-04-29 14:35:57.477616298 +0000 UTC 2023-04-29 14:35:57.130328898 +0000 UTC 10.251.23.7 time.cloudflare.com 2023-04-29 14:35:57.011647618 +0000 UTC 2023-04-29 14:35:57.130376492 +0000 UTC 10.251.23.127 time.cloudflare.com 2023-04-29 14:35:57.343519255 +0000 UTC 2023-04-29 14:35:57.130345105 +0000 UTC 10.251.23.121 time.cloudflare.com 2023-04-29 14:35:57.343946889 +0000 UTC 2023-04-29 14:35:57.130336154 +0000 UTC 10.251.23.128 time.cloudflare.com 2023-04-29 14:35:57.330881974 +0000 UTC 2023-04-29 14:35:57.130348667 +0000 UTC 10.251.23.5 time.cloudflare.com 2023-04-29 14:35:57.23244083 +0000 UTC 2023-04-29 14:35:57.130328598 +0000 UTC 10.251.23.123 time.cloudflare.com 2023-04-29 14:35:57.137086103 +0000 UTC 2023-04-29 14:35:57.130320844 +0000 UTC 10.251.23.13 time.cloudflare.com 2023-04-29 14:35:57.158154978 +0000 UTC 2023-04-29 14:35:57.130368159 +0000 UTC 10.251.23.14 time.cloudflare.com 2023-04-29 14:35:57.129647746 +0000 UTC 2023-04-29 14:35:57.131030244 +0000 UTC 10.251.23.120 time.cloudflare.com 2023-04-29 14:35:57.352140651 +0000 UTC 2023-04-29 14:35:57.138477069 +0000 UTC

uhthomas commented 1 year ago

@dubyatp Bare metal.

uhthomas commented 1 year ago

@smira Is there anything I can do to provide further insight into this problem? I am still seeing heavy clock skew.

smira commented 1 year ago

@uhthomas nothing new since my first comment - according to the logs you provided you have consistent errors talking to different NTP servers in your environment - you can try picking up a different NTP pool, geo-specific, or try using cloudflare NTP, your local NTP server, etc.

uhthomas commented 1 year ago
10.0.0.101: user: warning: [2023-06-01T20:40:29.286906028Z]: [talos] adjusting time (jump) by 514.094538ms via 176.58.109.199, state TIME_OK, status STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: user: warning: [2023-06-01T20:40:29.482388028Z]: [talos] synchronized RTC with system clock {"component": "controller-runtime", "controller": "time.SyncController"}

Wow, just randomly caught up by 500ms?

10.0.0.101: user: warning: [2023-06-02T09:05:47.911778028Z]: [talos] adjusting time (jump) by 427.201705ms via 176.58.109.199, state TIME_OK, status STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: user: warning: [2023-06-02T09:05:48.107330028Z]: [talos] synchronized RTC with system clock {"component": "controller-runtime", "controller": "time.SyncController"}

Then again by another 400ms only a day later.

buroa commented 1 year ago

I see this often on my Talos nodes as well.

mmalyska commented 1 year ago

For me I fixed this by running NTP server on my router. I don't know why some Talos nodes (in my case 1 of 3) couldn't get time from the default NTP pool.

smira commented 1 year ago

Default NTP pool doesn't really have good SLA, you should better customize that to use the closest server, check which one works better, etc.

uhthomas commented 1 year ago

@smira Is this graph helpful?

image

smira commented 1 year ago

@uhthomas this graph shows how Talos sync the time, and these jumps are time correction periods, for me the question is why there are such huge jumps.

probably it might be interesting to see the full output of time sync, talosctl logs controller-runtime | grep time.Sync, it should have a lot of information on what's going on.

uhthomas commented 1 year ago

image

❯ talosctl --talosconfig=unwind-talosconfig -n 10.0.0.101 logs controller-runtime | grep time.Sync
10.0.0.101: 2023-07-24T10:02:23.666Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "303.264µs", "rtt": "1.69184ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.265381ms", "root_dispersion": "6.774902ms", "root_distance": "9.253512ms"}
10.0.0.101: 2023-07-24T10:02:23.666Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "11.615685ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T10:02:23.666Z DEBUG adjusting time (slew) by 303.264µs via 139.162.219.252, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T10:02:23.666Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "303.263µs", "freq_offset": -229118, "freq_offset_ppm": -3}
10.0.0.101: 2023-07-24T10:36:31.669Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "2.346442ms", "rtt": "1.553523ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.311157ms", "root_dispersion": "7.446289ms", "root_distance": "9.878629ms"}
10.0.0.101: 2023-07-24T10:36:31.669Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "12.15969ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T10:36:31.669Z DEBUG adjusting time (slew) by 2.346442ms via 139.162.219.252, state TIME_OK, status STA_NANO | STA_PLL {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T10:36:31.669Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "2.346441ms", "freq_offset": -210347, "freq_offset_ppm": -3}
10.0.0.101: 2023-07-24T11:10:39.671Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "847.331µs", "rtt": "1.715239ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.28064ms", "root_dispersion": "8.880615ms", "root_distance": "11.378554ms"}
10.0.0.101: 2023-07-24T11:10:39.671Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "12.376813ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T11:10:39.671Z DEBUG adjusting time (slew) by 847.331µs via 139.162.219.252, state TIME_OK, status STA_NANO | STA_PLL {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T11:10:39.671Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "847.33µs", "freq_offset": -203568, "freq_offset_ppm": -3}
10.0.0.101: 2023-07-24T11:44:47.673Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-916.864µs", "rtt": "1.64964ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.250122ms", "root_dispersion": "9.353638ms", "root_distance": "11.803519ms"}
10.0.0.101: 2023-07-24T11:44:47.673Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "12.166886ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T11:44:47.673Z DEBUG adjusting time (slew) by -916.864µs via 139.162.219.252, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T11:44:47.673Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "-916.863µs", "freq_offset": -210903, "freq_offset_ppm": -3}
10.0.0.101: 2023-07-24T12:18:55.676Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "665.965µs", "rtt": "1.515397ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.326416ms", "root_dispersion": "6.744385ms", "root_distance": "9.165291ms"}
10.0.0.101: 2023-07-24T12:18:55.676Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "4.42845ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T12:18:55.676Z DEBUG adjusting time (slew) by 665.965µs via 139.162.219.252, state TIME_OK, status STA_NANO | STA_PLL {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T12:18:55.676Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "665.964µs", "freq_offset": -205575, "freq_offset_ppm": -3}
10.0.0.101: 2023-07-24T12:53:03.678Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "2.422451ms", "rtt": "1.729257ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.295898ms", "root_dispersion": "8.468628ms", "root_distance": "10.981205ms"}
10.0.0.101: 2023-07-24T12:53:03.678Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "4.425697ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T12:53:03.679Z DEBUG adjusting time (slew) by 2.422451ms via 139.162.219.252, state TIME_OK, status STA_NANO | STA_PLL {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T12:53:03.679Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "2.42245ms", "freq_offset": -186196, "freq_offset_ppm": -2}
10.0.0.101: 2023-07-24T13:27:11.681Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-4.078371ms", "rtt": "1.558431ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.295898ms", "root_dispersion": "6.988525ms", "root_distance": "9.415689ms"}
10.0.0.101: 2023-07-24T13:27:11.681Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "2.173214ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T13:27:11.681Z DEBUG adjusting time (slew) by -4.078371ms via 139.162.219.252, state TIME_OK, status STA_NANO | STA_PLL {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T13:27:11.681Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "-4.07837ms", "freq_offset": -218823, "freq_offset_ppm": -3}
10.0.0.101: 2023-07-24T14:01:19.684Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-904.908µs", "rtt": "1.702486ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.311157ms", "root_dispersion": "3.097534ms", "root_distance": "5.604355ms"}
10.0.0.101: 2023-07-24T14:01:19.684Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "2.189435ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T14:01:19.684Z DEBUG adjusting time (slew) by -904.908µs via 139.162.219.252, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T14:01:19.684Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "-904.907µs", "freq_offset": -226062, "freq_offset_ppm": -3}
10.0.0.101: 2023-07-24T14:35:27.686Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "312.693µs", "rtt": "1.604156ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.311157ms", "root_dispersion": "3.40271ms", "root_distance": "5.860366ms"}
10.0.0.101: 2023-07-24T14:35:27.686Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "2.189215ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T14:35:27.686Z DEBUG adjusting time (slew) by 312.693µs via 139.162.219.252, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T14:35:27.686Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "312.692µs", "freq_offset": -223560, "freq_offset_ppm": -3}
10.0.0.101: 2023-07-24T15:09:35.689Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-1.448293ms", "rtt": "1.559041ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.28064ms", "root_dispersion": "5.889893ms", "root_distance": "8.309733ms"}
10.0.0.101: 2023-07-24T15:09:35.689Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "2.24228ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T15:09:35.689Z DEBUG adjusting time (slew) by -1.448293ms via 139.162.219.252, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T15:09:35.689Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "-1.448292ms", "freq_offset": -235147, "freq_offset_ppm": -3}
10.0.0.101: 2023-07-24T15:43:43.692Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "1.607816ms", "rtt": "1.604804ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.341675ms", "root_dispersion": "3.646851ms", "root_distance": "6.12009ms"}
10.0.0.101: 2023-07-24T15:43:43.692Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "2.269327ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T15:43:43.692Z DEBUG adjusting time (slew) by 1.607816ms via 139.162.219.252, state TIME_OK, status STA_NANO | STA_PLL {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T15:43:43.692Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "1.607815ms", "freq_offset": -222284, "freq_offset_ppm": -3}
10.0.0.101: 2023-07-24T16:17:51.695Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-17.463µs", "rtt": "1.73548ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.295898ms", "root_dispersion": "8.209229ms", "root_distance": "10.724918ms"}
10.0.0.101: 2023-07-24T16:17:51.695Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "2.204237ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T16:17:51.695Z DEBUG adjusting time (slew) by -17.463µs via 139.162.219.252, state TIME_OK, status STA_NANO | STA_PLL {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T16:17:51.695Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "-17.462µs", "freq_offset": -222424, "freq_offset_ppm": -3}
10.0.0.101: 2023-07-24T16:51:59.697Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "2.240021ms", "rtt": "1.683029ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.40271ms", "root_dispersion": "3.997803ms", "root_distance": "6.540672ms"}
10.0.0.101: 2023-07-24T16:51:59.697Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "4.888413ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T16:51:59.697Z DEBUG adjusting time (slew) by 2.240021ms via 139.162.219.252, state TIME_OK, status STA_NANO | STA_PLL {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T16:51:59.697Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "2.24002ms", "freq_offset": -204504, "freq_offset_ppm": -3}
10.0.0.101: 2023-07-24T17:26:07.700Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "243.457µs", "rtt": "1.600941ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.311157ms", "root_dispersion": "4.379272ms", "root_distance": "6.835321ms"}
10.0.0.101: 2023-07-24T17:26:07.700Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "4.530745ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T17:26:07.700Z DEBUG adjusting time (slew) by 243.457µs via 139.162.219.252, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T17:26:07.700Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "243.456µs", "freq_offset": -202556, "freq_offset_ppm": -3}
10.0.0.101: 2023-07-24T18:00:15.703Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "458.826µs", "rtt": "1.612798ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.295898ms", "root_dispersion": "7.614136ms", "root_distance": "10.068484ms"}
10.0.0.101: 2023-07-24T18:00:15.703Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "2.232429ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T18:00:15.703Z DEBUG adjusting time (slew) by 458.826µs via 139.162.219.252, state TIME_OK, status STA_NANO | STA_PLL {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T18:00:15.703Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "458.825µs", "freq_offset": -198885, "freq_offset_ppm": -3}
10.0.0.101: 2023-07-24T18:34:23.705Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-294.28µs", "rtt": "1.846483ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.311157ms", "root_dispersion": "7.18689ms", "root_distance": "9.76571ms"}
10.0.0.101: 2023-07-24T18:34:23.705Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "2.26535ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T18:34:23.705Z DEBUG adjusting time (slew) by -294.28µs via 139.162.219.252, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T18:34:23.705Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "-294.279µs", "freq_offset": -201240, "freq_offset_ppm": -3}
10.0.0.101: 2023-07-24T19:08:31.708Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-1.158055ms", "rtt": "1.686373ms", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "3.40271ms", "root_dispersion": "5.371094ms", "root_distance": "7.915635ms"}
10.0.0.101: 2023-07-24T19:08:31.708Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "2.16814ms", "poll_interval": "34m8s", "spike": false}
10.0.0.101: 2023-07-24T19:08:31.708Z DEBUG adjusting time (slew) by -1.158055ms via 139.162.219.252, state TIME_OK, status STA_NANO | STA_PLL {"component": "controller-runtime", "controller": "time.SyncController"}
10.0.0.101: 2023-07-24T19:08:31.708Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 8, "offset": "-1.158054ms", "freq_offset": -210504, "freq_offset_ppm": -3}
uhthomas commented 1 year ago

@smira So, from my previous comment, it looks like the time.Sync component actually thinks the clock isn't really that out of sync at all? Could it be something wrong there?

uhthomas commented 1 year ago

It looks like node-exporter uses timex. Maybe unix.Adjtimex is a better approach for detecting clock skew?

https://github.com/prometheus/node_exporter/blob/8fb4f78ce541f404144f86b3202cc53d2a0f387c/docs/TIME.md#timex-collector

edit: after reading Talos' NTP implementation, I can see adjtimex is already in use.

smira commented 1 year ago

Talos NTP implementation is really close to systemd one. What is the value on the graph? It doesn't match the adjtime state line which shows the state of adjtimex syscall.

uhthomas commented 1 year ago

Talos NTP implementation is really close to systemd one. What is the value on the graph? It doesn't match the adjtime state line which shows the state of adjtimex syscall.

Tt seems to be maxerror from adjtimex.

https://github.com/prometheus/node_exporter/blob/8fb4f78ce541f404144f86b3202cc53d2a0f387c/collector/timex.go#L167C17-L167C30

smira commented 1 year ago

yep, seems that it's growing until the next call to adjtimex which clears it. I will take a look into this next week, what might be wrong here

uhthomas commented 1 year ago

Thanks :) Let me know if there's anything I can do to help.

Thought I would add that it may also be worth referring to earlier logs too as Talos was correctly detecting huge skews of 500ms earlier in the thread.

buroa commented 1 year ago

Seeing this again. Nodes have been stable for about 48 hours; haven't see this happen on 1.4.8, and I'm currently running 1.5.1.

> talosctl -n m0.k8s.ktwo.io,m1.k8s.ktwo.io,m2.k8s.ktwo.io,w0.k8s.ktwo.io,w1.k8s.ktwo.io,w2.k8s.ktwo.io time
NODE             NTP-SERVER      NODE-TIME                                 NTP-SERVER-TIME
m2.k8s.ktwo.io   unifi.ktwo.io   2023-08-28 17:47:20.707314892 +0000 UTC   2023-08-28 17:47:20.740919012 +0000 UTC
m1.k8s.ktwo.io   unifi.ktwo.io   2023-08-28 17:47:20.709851875 +0000 UTC   2023-08-28 17:47:20.740960871 +0000 UTC
m0.k8s.ktwo.io   unifi.ktwo.io   2023-08-28 17:47:20.710613545 +0000 UTC   2023-08-28 17:47:20.740985994 +0000 UTC
w2.k8s.ktwo.io   unifi.ktwo.io   2023-08-28 17:47:21.008591744 +0000 UTC   2023-08-28 17:47:20.741009307 +0000 UTC
w0.k8s.ktwo.io   unifi.ktwo.io   2023-08-28 17:47:20.718991229 +0000 UTC   2023-08-28 17:47:20.741721625 +0000 UTC
w1.k8s.ktwo.io   unifi.ktwo.io   2023-08-28 17:47:21.010029749 +0000 UTC   2023-08-28 17:47:20.742208309 +0000 UTC

w1.k8s.ktwo.io image

> talosctl -n w1.k8s.ktwo.io logs controller-runtime | grep time.Sync
w1.k8s.ktwo.io: 2023-08-27T12:12:40.041Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-527.756µs", "rtt": "276.02µs", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "35.65979ms", "root_dispersion": "34.378052ms", "root_distance": "52.345957ms"}
w1.k8s.ktwo.io: 2023-08-27T12:12:40.041Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "516.653µs", "poll_interval": "34m8s", "spike": false}
w1.k8s.ktwo.io: 2023-08-27T12:12:40.041Z DEBUG adjusting time (slew) by -527.756µs via 192.168.0.1, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
w1.k8s.ktwo.io: 2023-08-27T12:12:40.041Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 7, "offset": "-527.755µs", "freq_offset": 1765298, "freq_offset_ppm": 26}
w1.k8s.ktwo.io: 2023-08-27T12:46:48.042Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-586.936µs", "rtt": "273.996µs", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "35.736084ms", "root_dispersion": "33.233643ms", "root_distance": "51.238683ms"}
w1.k8s.ktwo.io: 2023-08-27T12:46:48.042Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "497.706µs", "poll_interval": "34m8s", "spike": false}
w1.k8s.ktwo.io: 2023-08-27T12:46:48.042Z DEBUG adjusting time (slew) by -586.936µs via 192.168.0.1, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
w1.k8s.ktwo.io: 2023-08-27T12:46:48.042Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 7, "offset": "-586.935µs", "freq_offset": 1755907, "freq_offset_ppm": 26}
w1.k8s.ktwo.io: 2023-08-27T13:20:56.043Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-77.587µs", "rtt": "243.534µs", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "35.552979ms", "root_dispersion": "32.150269ms", "root_distance": "50.048525ms"}
w1.k8s.ktwo.io: 2023-08-27T13:20:56.043Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "384.231µs", "poll_interval": "34m8s", "spike": false}
w1.k8s.ktwo.io: 2023-08-27T13:20:56.043Z DEBUG adjusting time (slew) by -77.587µs via 192.168.0.1, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
w1.k8s.ktwo.io: 2023-08-27T13:20:56.043Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 7, "offset": "-77.586µs", "freq_offset": 1754665, "freq_offset_ppm": 26}
w1.k8s.ktwo.io: 2023-08-27T13:55:04.044Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-337.492µs", "rtt": "249.389µs", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "50.704956ms", "root_dispersion": "31.417847ms", "root_distance": "56.895019ms"}
w1.k8s.ktwo.io: 2023-08-27T13:55:04.044Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "917.73µs", "poll_interval": "34m8s", "spike": false}
w1.k8s.ktwo.io: 2023-08-27T13:55:04.044Z DEBUG adjusting time (slew) by -337.492µs via 192.168.0.1, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
w1.k8s.ktwo.io: 2023-08-27T13:55:04.044Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 7, "offset": "-337.491µs", "freq_offset": 1749265, "freq_offset_ppm": 26}
w1.k8s.ktwo.io: 2023-08-27T14:29:12.045Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "405.5µs", "rtt": "135.671µs", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "32.058716ms", "root_dispersion": "35.568237ms", "root_distance": "51.66543ms"}
w1.k8s.ktwo.io: 2023-08-27T14:29:12.045Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "687.966µs", "poll_interval": "34m8s", "spike": false}
w1.k8s.ktwo.io: 2023-08-27T14:29:12.045Z DEBUG adjusting time (slew) by 405.5µs via 192.168.0.1, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
w1.k8s.ktwo.io: 2023-08-27T14:29:12.045Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 7, "offset": "405.5µs", "freq_offset": 1755753, "freq_offset_ppm": 26}
w1.k8s.ktwo.io: 2023-08-27T15:03:20.047Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-501.422µs", "rtt": "292.063µs", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "35.430908ms", "root_dispersion": "29.846191ms", "root_distance": "47.707676ms"}
w1.k8s.ktwo.io: 2023-08-27T15:03:20.047Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "766.745µs", "poll_interval": "34m8s", "spike": false}
w1.k8s.ktwo.io: 2023-08-27T15:03:20.047Z DEBUG adjusting time (slew) by -501.422µs via 192.168.0.1, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
w1.k8s.ktwo.io: 2023-08-27T15:03:20.047Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 7, "offset": "-501.421µs", "freq_offset": 1747731, "freq_offset_ppm": 26}
w1.k8s.ktwo.io: 2023-08-27T15:37:28.048Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-392.005µs", "rtt": "115.708µs", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "36.422729ms", "root_dispersion": "30.014038ms", "root_distance": "48.283256ms"}
w1.k8s.ktwo.io: 2023-08-27T15:37:28.048Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "341.081µs", "poll_interval": "34m8s", "spike": false}
w1.k8s.ktwo.io: 2023-08-27T15:37:28.048Z DEBUG adjusting time (slew) by -392.005µs via 192.168.0.1, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
w1.k8s.ktwo.io: 2023-08-27T15:37:28.048Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 7, "offset": "-392.004µs", "freq_offset": 1741459, "freq_offset_ppm": 26}
w1.k8s.ktwo.io: 2023-08-27T16:11:36.049Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "206.917µs", "rtt": "233.819µs", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "50.170898ms", "root_dispersion": "29.86145ms", "root_distance": "55.063808ms"}

w2.k8s.ktwo.io image

> talosctl -n w2.k8s.ktwo.io logs controller-runtime | grep time.Sync
w2.k8s.ktwo.io: 2023-08-27T12:18:32.807Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-584.729µs", "rtt": "168.796µs", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "35.65979ms", "root_dispersion": "39.672852ms", "root_distance": "57.587145ms"}
w2.k8s.ktwo.io: 2023-08-27T12:18:32.807Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "386.519µs", "poll_interval": "34m8s", "spike": false}
w2.k8s.ktwo.io: 2023-08-27T12:18:32.807Z DEBUG adjusting time (slew) by -584.729µs via 192.168.0.1, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
w2.k8s.ktwo.io: 2023-08-27T12:18:32.807Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 7, "offset": "-584.728µs", "freq_offset": 1460617, "freq_offset_ppm": 22}
w2.k8s.ktwo.io: 2023-08-27T12:52:40.808Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-573.414µs", "rtt": "157.058µs", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "35.736084ms", "root_dispersion": "38.528442ms", "root_distance": "56.475013ms"}
w2.k8s.ktwo.io: 2023-08-27T12:52:40.808Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "418.422µs", "poll_interval": "34m8s", "spike": false}
w2.k8s.ktwo.io: 2023-08-27T12:52:40.808Z DEBUG adjusting time (slew) by -573.414µs via 192.168.0.1, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
w2.k8s.ktwo.io: 2023-08-27T12:52:40.808Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 7, "offset": "-573.413µs", "freq_offset": 1451442, "freq_offset_ppm": 22}
w2.k8s.ktwo.io: 2023-08-27T13:26:48.809Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-143.058µs", "rtt": "193.79µs", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "35.552979ms", "root_dispersion": "37.445068ms", "root_distance": "55.318452ms"}
w2.k8s.ktwo.io: 2023-08-27T13:26:48.809Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "401.946µs", "poll_interval": "34m8s", "spike": true}
w2.k8s.ktwo.io: 2023-08-27T14:00:56.810Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-479.732µs", "rtt": "162.544µs", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "50.704956ms", "root_dispersion": "36.712646ms", "root_distance": "62.146396ms"}
w2.k8s.ktwo.io: 2023-08-27T14:00:56.810Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "409.067µs", "poll_interval": "34m8s", "spike": false}
w2.k8s.ktwo.io: 2023-08-27T14:00:56.810Z DEBUG adjusting time (slew) by -479.732µs via 192.168.0.1, state TIME_OK, status STA_PLL | STA_NANO | STA_MODE {"component": "controller-runtime", "controller": "time.SyncController"}
w2.k8s.ktwo.io: 2023-08-27T14:00:56.810Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 7, "offset": "-479.731µs", "freq_offset": 1441848, "freq_offset_ppm": 22}
w2.k8s.ktwo.io: 2023-08-27T14:35:04.811Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "501.823µs", "rtt": "119.246µs", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "32.058716ms", "root_dispersion": "40.863037ms", "root_distance": "56.952018ms"}
w2.k8s.ktwo.io: 2023-08-27T14:35:04.811Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "464.135µs", "poll_interval": "34m8s", "spike": false}
w2.k8s.ktwo.io: 2023-08-27T14:35:04.811Z DEBUG adjusting time (slew) by 501.823µs via 192.168.0.1, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
w2.k8s.ktwo.io: 2023-08-27T14:35:04.811Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 7, "offset": "501.822µs", "freq_offset": 1449877, "freq_offset_ppm": 22}
w2.k8s.ktwo.io: 2023-08-27T15:09:12.812Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-432.298µs", "rtt": "103.787µs", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "35.430908ms", "root_dispersion": "35.140991ms", "root_distance": "52.908338ms"}
w2.k8s.ktwo.io: 2023-08-27T15:09:12.812Z DEBUG sample stats {"component": "controller-runtime", "controller": "time.SyncController", "jitter": "417.762µs", "poll_interval": "34m8s", "spike": false}
w2.k8s.ktwo.io: 2023-08-27T15:09:12.812Z DEBUG adjusting time (slew) by -432.298µs via 192.168.0.1, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
w2.k8s.ktwo.io: 2023-08-27T15:09:12.812Z DEBUG adjtime state {"component": "controller-runtime", "controller": "time.SyncController", "constant": 7, "offset": "-432.297µs", "freq_offset": 1442960, "freq_offset_ppm": 22}
w2.k8s.ktwo.io: 2023-08-27T15:43:20.813Z DEBUG NTP response {"component": "controller-runtime", "controller": "time.SyncController", "clock_offset": "-481.241µs", "rtt": "243.69µs", "leap": 0, "stratum": 2, "precision": "238ns", "root_delay": "36.422729ms", "root_dispersion": "35.308838ms", "root_distance": "53.642047ms"}
smira commented 1 year ago

So looking at your logs it's the spike detection algorithm which kicks in and ignores the response from the NTP server. There's nothing bad in that on its own, as the time is perfectly in sync, but the update happens on next poll, that's why kernel estimation for an error jumps so high. I don't have any exact idea on what can be done at the moment, but there's no real problem except for the monitoring alert.

buroa commented 1 year ago

Thanks for working this out @smira, greatly appreciate all you do.