seapath / ansible

This repo contains all the ansible playbooks used to deploy or manage a cluster, as well as inventories examples
https://lfenergy.org/projects/seapath/
Apache License 2.0
6 stars 16 forks source link

System clock does not synchronize with Timemaster #489

Closed bchedotel closed 5 months ago

bchedotel commented 6 months ago

Context

With the aim of setting up latency tests in the Seapath CI, I have the following setup:

setup_issue_timemaster

The publisher will send SV to a test Virtual VM on the hypervisor. The clock systems are synchronized in PTP using the tools provided by LinuxPTP (ptp4l and phc2sys), with timemaster chosen to enable NTP backup.

SVs are sent to the VM in PCI-passthrough, and its system clock is synchronized by ptp_kvm, which synchronizes it with the hypervisor's system clock.

The publisher configurations for ptp4l and phc2sys are as follows:

$ ptp4l -i ptp -f /etc/linuxptp/ptp4l.conf -q -m -l 7

where the configuration file is:

[Global]
slaveOnly       0
gmCapable       1
domainNumber             0

# Announce interval: 1s
logAnnounceInterval      0

# Sync interval: 1 s
logSyncInterval          0

# Pdelay interval: 1 s
logMinPdelayReqInterval  0

# Announce receipt time-out: 3 s (fixed)
announceReceiptTimeout   3

priority1                255
priority2                255
# Default clock class : any specialised clock will be better (ie a GPS Grand Master Clock)
network_transport        L2
delay_mechanism          E2E
clockClass 248
clockAccuracy 0xFE
offsetScaledLogVariance 0xFFFF

and:

$ phc2sys -s ptp -c CLOCK_REALTIME -O 0 -m -l 7

On the hypervisor side, we use timemaster, with the following configuration file:

[ptp_domain 0]
ntp_options poll 0
interfaces ptp
delay 1e-9

[timemaster]
ntp_program chronyd

[chrony.conf]
include /etc/chrony.conf
server 192.168.132.1 iburst maxsamples 10

[ntp.conf]
includefile /etc/ntp.conf

[ptp4l.conf]
slaveOnly             1

# IEC 61850-9-3 Profile
# (from : https://en.wikipedia.org/wiki/IEC/IEEE_61850-9-3)
network_transport     L2
delay_mechanism       E2E
domainNumber             0

# Announce interval: 1s
logAnnounceInterval      0

# Sync interval: 1 s
logSyncInterval          0

# Pdelay interval: 1 s
logMinPdelayReqInterval  0
operLogPdelayReqInterval 0

# Announce receipt time-out: 3 s (fixed)
announceReceiptTimeout   3

# Slave-only priority :
priority1                255
priority2                255
# Default clock class : any specialised clock will be better (ie a GPS Grand Master Clock)
clockClass               248

[chronyd]
path /usr/sbin/chronyd

[ntpd]
path /usr/sbin/ntpd
options -u ntp:ntp -g

[phc2sys]
path /usr/sbin/phc2sys
options -l 7

[ptp4l]
path /usr/sbin/ptp4l
options --step_threshold 0.00001 -l 7

Finally, thanks to ptp_kvm, which creates a ptp device in the VM, we can synchronize the clock system with it:

$ phc2sys -s /dev/ptp1 -c CLOCK_REALTIME -O 0 -m -l 7

Problem

Using the command: $ phc_ctl <phc_device> cmp to compare the offset between the system clock and the PHC, we observe good synchronization of a few nanoseconds for the publisher and VM system clocks, while the hypervisor clock is completely out of sync:

phc_ctl[25088.645]: offset from CLOCK_REALTIME is 32738958839ns

We can also verify this by taking a look at the phc2sys systemd logs:

phc2sys[29054]: [25161.763] [0:ptp] CLOCK_REALTIME phc offset 69740848222 s0 freq  +0 delay 2892

Investigations

The problem seems to come from the phc2sys automatic mode imposed by timemaster and the use of chrony. In fact when you bypass the problem by adding phc2sys options in the timemaster configuration file [^1]: [^1]:double dash lets us drop the phc2sys options imposed by timemaster, which follow the options we can add using the configuration file.

options -s ptp -c CLOCK_REALTIME -O 0 --

and remove chrony, we get a synchronized system clock:

phc_ctl[27410.444]: offset from CLOCK_REALTIME is 38ns

Questions

If my reasoning is correct, why are chrony and phc2sys' automatic mode preventing timemaster from working properly, and how can this be fixed? If not, what kind of misconfiguration could I have made?

Thank you in advance for your attention to my issue.

ebail commented 6 months ago

@insatomcat could you confirm that you also have this issue on your side ?

bchedotel commented 5 months ago

A solution to the problem has been found. First of all, the ip address provided to chrony didn't work, so I changed it. Secondly, it seemed that the problem was due to a UTC / TAI difference between the publisher's system clock and that of the hypervisor, so I added an offset of 37 seconds on the publisher.