scylladb / scylla-machine-image

Apache License 2.0
18 stars 25 forks source link

fix(images):use default NTP configuration #441

Closed yaronkaikov closed 1 year ago

yaronkaikov commented 1 year ago

disabling ntp configuration during image creation so we will use the default cloud recommended configuration

Closes: https://github.com/scylladb/scylladb/issues/13344

yaronkaikov commented 1 year ago

I think we should add --no-ntp-setup on the tail of run('/opt/scylladb/scripts/scylla_setup ...') line, not sysconfig_opt, since we disable it on all clouds.

Yes, you are right, fixed

yaronkaikov commented 1 year ago

Verified with https://jenkins.scylladb.com/job/scylla-master/job/releng-testing/job/next-machine-image/78/

Azure image configuration:

azureuser@yaron-test2:~$ chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
#* PHC0                          0   3   377    10  +8087ns[  +16us] +/- 1526ns

GCP image configuration:

yaronkaikov@instance-1:~$ chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^* metadata.google.internal      2   6    37    47    -13us[-5849us] +/-  318us

AWS image configuration:

scyllaadm@ip-10-99-17-28:~$ chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^* 169.254.169.123               3   4    37    10   -154ns[-1612ns] +/-  484us
syuu1228 commented 1 year ago

I thought that AMI base image (Ubuntu 22.04 minimal) does not have new NTP setting (169.254.169.123), but it was wrong. /etc/chrony/conf.d/00-cpc.conf is the file to overwrite default configuration, change the server address to 169.254.169.123. I cloud verify chrony is referencing to 169.254.169.123 (tested on Ubuntu 22.04 minimal AMI):

$ chronyc tracking
Reference ID    : A9FEA97B (169.254.169.123)
Stratum         : 4
Ref time (UTC)  : Thu Apr 13 14:21:31 2023
System time     : 0.000001611 seconds fast of NTP time
Last offset     : +0.000002690 seconds
RMS offset      : 0.000010544 seconds
Frequency       : 35.703 ppm fast
Residual freq   : +0.003 ppm
Skew            : 0.129 ppm
Root delay      : 0.000513758 seconds
Root dispersion : 0.000270585 seconds
Update interval : 16.2 seconds
Leap status     : Normal

So I think we actually don't need a patch to change NTP server address. But passing --no-ntp-setup to scylla_setup is still correct, since base image already has optimal NTP settings and we don't need to change that.

yaronkaikov commented 1 year ago

I thought that AMI base image (Ubuntu 22.04 minimal) does not have new NTP setting (169.254.169.123), but it was wrong. /etc/chrony/conf.d/00-cpc.conf is the file to overwrite default configuration, change the server address to 169.254.169.123. I cloud verify chrony is referencing to 169.254.169.123 (tested on Ubuntu 22.04 minimal AMI):

$ chronyc tracking
Reference ID    : A9FEA97B (169.254.169.123)
Stratum         : 4
Ref time (UTC)  : Thu Apr 13 14:21:31 2023
System time     : 0.000001611 seconds fast of NTP time
Last offset     : +0.000002690 seconds
RMS offset      : 0.000010544 seconds
Frequency       : 35.703 ppm fast
Residual freq   : +0.003 ppm
Skew            : 0.129 ppm
Root delay      : 0.000513758 seconds
Root dispersion : 0.000270585 seconds
Update interval : 16.2 seconds
Leap status     : Normal

So I think we actually don't need a patch to change NTP server address. But passing --no-ntp-setup to scylla_setup is still correct, since base image already has optimal NTP settings and we don't need to change that.

@syuu1228 by passing only --no-ntp-setup i get the following output:

scyllaadm@ip-10-99-17-184:~$ chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^* 169.254.169.123               3   4    37    14    +14us[ -219us] +/-  619us
^- prod-ntp-5.ntp4.ps5.cano>     2   6    17    30   +543us[ +287us] +/-   41ms
^- prod-ntp-3.ntp1.ps5.cano>     2   6    17    30   +558us[ +324us] +/-   41ms
^- prod-ntp-4.ntp4.ps5.cano>     2   6    17    31   +287us[  +31us] +/-   37ms
^- pugot.canonical.com           2   6    17    31   +712us[ +456us] +/-   73ms
^- c-24-4-159-115.hsd1.ca.c>     1   6    17    31  +7198us[+6942us] +/-   49ms
^- 208.67.72.50                  3   6    17    31  +4113us[+3858us] +/-  109ms
^- c-73-61-36-59.hsd1.nh.co>     3   6    17    41  +1991us[+1506us] +/-   37ms
^- time.richiemcintosh.com       2   6    17    40  +1542us[+1057us] +/-   50ms

So yes , the default is good, but we have a lot of other sources which we don't need. (https://jenkins.scylladb.com/job/scylla-master/job/releng-testing/job/next-machine-image/81/)

syuu1228 commented 1 year ago

@syuu1228 by passing only --no-ntp-setup i get the following output:

scyllaadm@ip-10-99-17-184:~$ chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^* 169.254.169.123               3   4    37    14    +14us[ -219us] +/-  619us
^- prod-ntp-5.ntp4.ps5.cano>     2   6    17    30   +543us[ +287us] +/-   41ms
^- prod-ntp-3.ntp1.ps5.cano>     2   6    17    30   +558us[ +324us] +/-   41ms
^- prod-ntp-4.ntp4.ps5.cano>     2   6    17    31   +287us[  +31us] +/-   37ms
^- pugot.canonical.com           2   6    17    31   +712us[ +456us] +/-   73ms
^- c-24-4-159-115.hsd1.ca.c>     1   6    17    31  +7198us[+6942us] +/-   49ms
^- 208.67.72.50                  3   6    17    31  +4113us[+3858us] +/-  109ms
^- c-73-61-36-59.hsd1.nh.co>     3   6    17    41  +1991us[+1506us] +/-   37ms
^- time.richiemcintosh.com       2   6    17    40  +1542us[+1057us] +/-   50ms

So yes , the default is good, but we have a lot of other sources which we don't need. (https://jenkins.scylladb.com/job/scylla-master/job/releng-testing/job/next-machine-image/81/)

Okay, but Time sync guide on AWS doesn't say removing existing pool entries, it says just add server 169.254.169.123. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html

Also Red Hat document says:

It is NOT recommended to use only two NTP servers. If more than one NTP server is required, four NTP servers is the recommended minimum. Four servers protect against one incorrect timesource, or "falseticker". https://access.redhat.com/solutions/58025

So probably single server configuration is not good.

Although, I found that AWS document also says they have their own NTP pool, it can use it by adding following entry: pool time.aws.com iburst

So if we don't want to use default public NTP pools, probably we can switch to 169.254.169.123 and time.aws.com. I commended out all pool entries on chrony.conf and added pool time.aws.com iburst, it works like this:

$ chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^* 169.254.169.123               3   4   177     2  -5090ns[  -12us] +/-  525us
^- ec2-34-201-171-241.compu>     4   6    17    50    +22us[  +26us] +/-  884us
^- ec2-18-212-60-76.compute>     4   6    17    51    -24us[  -16us] +/-  641us
^- ec2-34-229-185-123.compu>     4   6    17    51    +11us[  +19us] +/-  870us
^- ec2-3-85-98-105.compute->     4   6    17    50  -7617ns[-3629ns] +/-  614us

And modifying chrony.conf can be done something like this:

with open('/etc/chrony/chrony.conf') as f:
    chrony_conf = f.read()

chrony_conf = re.sub(r'^(pool .*$)', '# \\1', chrony_conf, flags=re.MULTILINE)
with open('/etc/chrony/chrony.conf', 'w') as f:
    f.write(chrony_conf)

with open('/etc/chrony/sources.d/ntp-pool.sources', 'w') as f:
    f.write('pool time.aws.com iburst\n')
yaronkaikov commented 1 year ago

@syuu1228 https://jenkins.scylladb.com/job/scylla-master/job/releng-testing/job/next-machine-image/93/ verified after the latest changes

scyllaadm@ip-10-99-17-8:~$ chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^* 169.254.169.123               3   4   377    13    +18us[  +21us] +/-  543us
^- ec2-54-234-209-141.compu>     4   6    77    12    +26us[  +26us] +/-  743us
^- ec2-34-229-185-123.compu>     4   6    77    13    +65us[  +68us] +/- 1065us
^- ec2-34-201-171-241.compu>     4   6    77    13    +26us[  +26us] +/- 1051us
^- ec2-3-85-98-105.compute->     4   6    77    12    +10us[  +10us] +/-  672us
scyllaadm@ip-10-99-17-8:~$ cat /etc/chrony/
chrony.conf  chrony.keys  conf.d/      sources.d/   
scyllaadm@ip-10-99-17-8:~$ cat /etc/chrony/sources.d/
README            ntp-pool.sources  
scyllaadm@ip-10-99-17-8:~$ cat /etc/chrony/sources.d/ntp-pool.sources 
pool time.aws.com iburst
yaronkaikov commented 1 year ago

rebased