neondatabase / autoscaling

Postgres vertical autoscaling in k8s
Apache License 2.0
152 stars 21 forks source link

Bug: VM clocks drift (need to have ntpd or equivalent) #712

Closed sharnoff closed 8 months ago

sharnoff commented 9 months ago

Environment

Production

Steps to reproduce

Presumably, leave a VM running for a long time — haven't yet validated.

Expected result

The clock in the VM should not significantly drift — at minimum, it shouldn't be off by more than 0.1s, but ideally it'd remain within 1ms. There may be difficulties in practice from cgroup CPU throttling.

Actual result

On startup, the clock is roughly synchronized, but we've seen noticeable differences in the past (IIRC, 0.2s), and a user recently reported a more significant difference - ref https://neondb.slack.com/archives/C04DGM6SMTM/p1703259018496809.

Other logs, links

sharnoff commented 8 months ago

Notes from discussion:

  1. There's a way to get QEMU to do the synchronization for us
  2. AWS has a clock syncing service
  3. @sharnoff to work on this
m0r0zzz commented 8 months ago

In the case of KVM virtual machines, running ntpd in the guest OS to synchronize and syntonize its clock to the host one may be a suboptimal solution due to the usage of simple software skb timestamps in the virtio net driver - when combined with the non-deterministic nature of hypervisor time stealing, this can lead to increased guest clock offset and frequency jitter.

Using solutions provided by QEMU (i.e. setting time periodically via qemu-guest-agent or using its PV RTC clock to periodically set the CLOCK_REALTIME clock in a custom agent) may provide less jitter but may lead to problems related to the incorrect behavior of various software in an environment with frequent realtime clock steps (in my experience, some programs can't handle time "jumps" at all; this may lead to timed events being missed because the required time value has been "cut out" from the CLOCK_REALTIME time axis by the aforementioned jump).

There are other time synchronization solutions that can be recommended in this use case (in increasing precision order):

  1. Use a paravirtualized PTP hardware clock with the kvm-ptp driver and compatible time server (e.g. chrony, with refclock PHC device configured). As the kvm-ptp driver acquires host time directly via a hypervisor call, this solution will reduce time stealing-related jitter. This solution is recommended by Red Hat Inc. to achieve time synchronization accuracy in order of ten microseconds
  2. Use in-kernel time synchronization facilities (so-called "kernel NTP server") in the form of a paravirtualized PPS driver or standalone hypervisor time synchronization kthread. This will reduce the jitter introduced by irregular time synchronization thread scheduling, as the synchronization procedure will be executed entirely in kernelspace:   - Develop a PPS client driver, which will use time acquired via a hypervisor call (as in kvm-ptp) as a timebase to trigger an automatic in-kernel clock synchronization procedure using the hardpps() function   - Develop a stand-alone kernel module that will start a kthread or timer with time synchronization code inside. This code will periodically acquire host clock time via a hypervisor call and pass it to the main kernel time synchronization procedure - do_adjtimex() function.
  3. Straight-up replace all standard POSIX clocks in the kernel with their "hypervisor" counterparts; modify all POSIX clock time-acquisition functions to do hypervisor calls instead of relying on internal "virtual" hrtimers. This may be a bit hard-core, slow (clock acquisition functions, which are normally accessible via vDSO now will require a machine context switch), and somewhat difficult to implement, but will amount to perfect time synchronization.
sharnoff commented 8 months ago

status: blocked on review