Add `timeservers` section to MachineConfig

cgwalters commented 5 years ago

MachineConfigs are low level and force host reboots. Today we have "internal CRDs" like container/kubelet config that "render down" to MCs but there's no reason for changing the kubelet config to force a reboot.

Related to this, I think we should go down the path of adding some "high level" options for the operating system to MachineConfig. A bit like the kernel args proposal except let's add e.g.

diff --git a/pkg/apis/machineconfiguration.openshift.io/v1/types.go b/pkg/apis/machineconfiguration.openshift.io/v1/types.go
index 33cca60..236d409 100644
--- a/pkg/apis/machineconfiguration.openshift.io/v1/types.go
+++ b/pkg/apis/machineconfiguration.openshift.io/v1/types.go
@@ -229,6 +229,8 @@ type MachineConfigSpec struct {
    OSImageURL string `json:"osImageURL"`
    // Config is a Ignition Config object.
    Config ignv2_2types.Config `json:"config"`
+
+   NTPServers []string `json:"ntpServers"`
 }

 // +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

Or it could be a new MachineTimeServers CRD though that feels a bit like overkill unless we really need to expose a lot of options.

If we had this it'd be much easier for the MCD to say "oh it's just the timeservers changing, let's not reboot the host".

runcom commented 5 years ago

Or it could be a new MachineTimeServers CRD though that feels a bit like overkill unless we really need to expose a lot of options.

I would go CRDs as the MachineConfigSpec could grow with kargs, this and maybe more and it makes more sense from a UI pov that we have new types so that oc get can be helpful as well (might be just me, not a big deal either way for me)

cgwalters commented 5 years ago

(Of course though for time...maybe we should only use chrony as a bootstrap and have workers get time from the control plane or something)

JAORMX commented 5 years ago

It would be very beneficial for us to be able to configure chronyd and rely on it setting up the correct time for the host, given that it would help us meet some compliance controls.

e.g. For FedRAMP, we need to comply with AU-8, which specifies the following:

The information system: (a) Compares the internal information system clocks [Assignment: organization-defined frequency] with [Assignment: organization-defined authoritative time source]; and (b) Synchronizes the internal system clocks to the authoritative time source when the time difference is greater than [Assignment: organization-defined time period].

Supplemental Guidance: This control enhancement provides uniformity of time stamps for information systems with multiple system clocks and systems connected over a network. AU-8 (1) [http://tf.nist.gov/tf-cgi/servers.cgi] [At least hourly] AU-8 (1) Requirement: The service provider selects primary and secondary time servers used by the NIST Internet time service. The secondary server is selected from a different geographic region than the primary server. AU-8 (1) Requirement: The service provider synchronizes the system clocks of network computers that run operating systems other than Windows to the Windows Server Domain Controller emulator or to the same time source for that server. AU-8 (1) Guidance: Synchronization of system clocks improves the accuracy of log analysis.

Providing deployers with the ability to configure chronyd with any time source (folks will use one of the ones specified by the requirement), will help them fulfill this requirement.

JAORMX commented 5 years ago

I'm leaning a bit more towards the CRD for the timeservers configuration. This way, more options could be added as needed, and it would also be cleanly separated from the rest of the configuration.

cgwalters commented 5 years ago

Providing deployers with the ability to configure chronyd with any time source

To be clear I think we have this capability today, this issue is about streamlining it.

However - while we clearly need chronyd as it is today before the cluster comes up (time needs to be right to speak TLS), I do wonder whether we should switch to having nodes sync time from a cluster service instead post-join. And maybe the MCD helps ensure that transition?

There's also https://roughtime.googlesource.com/roughtime which I wonder if anyone in OS land is looking at - that seems mostly like what we need pre cluster join.

JAORMX commented 5 years ago

What would be the advantage of using a cluster service?

cgwalters commented 5 years ago

What would be the advantage of using a cluster service?

Don't need to have every node sending traffic out somewhere external (default: the Internet).
Greater observability; e.g. could look at metrics in Prometheus for the in-cluster server
More centralized; if you want to change the cluster timeserver's sources it only needs to happen in one place rather than changing every node

JAORMX commented 5 years ago

What would be the advantage of using a cluster service?
* Don't need to have every node sending traffic out somewhere external (default: the Internet).
Organizations usually have their own NTP server(s), for cases where they don't want traffic towards the outside. Seems to me that using plain chronyd and pointing it towards the internal NTP server would achieve the same result.
* Greater observability; e.g. could look at metrics in Prometheus for the in-cluster server
This would certainly be a plus.
* More centralized; if you want to change the cluster timeserver's sources it only needs to happen in one place rather than changing every node

This would be achieved as well by creating a CRD and having a controller manage this. I was thinking of having something similar to what the ContainerRuntimeConfig, where a controller would apply the timeserver configuration. e.g. "MachineTimeServersConfig"

Initially we would only allow the configuration of different timeservers, but as use-cases arise we could start adding more options there.

From compliance point of view, all we need is to allow deployers to configure specific NTP servers and to make sure that time is synced at least every hour.

JAORMX commented 5 years ago

So, I had a conversation with @runcom , and we agreed that for now, using plain MachineConfig was the way forward to fulfil this requirement. Similar to what's documented here: https://github.com/openshift/machine-config-operator#applying-configuration-changes-to-the-cluster

So, as a path forward, I will try that out and post the results here.

This triggers a reboot every time we want to change the timeservers... which is not ideal. But it's not too problematic either, since changing timeservers is not an operation we do often.

JAORMX commented 5 years ago

It worked; This could be closed as we have a workaround.

cgwalters commented 5 years ago

One thing related to this: https://bugzilla.redhat.com/show_bug.cgi?id=1723955

We basically need to ensure time is syncd before the kubelet starts, otherwise the CSRs it creates may have bad time and be rejected.

We also want time to be sync'd reliably in the cluster day 2 and after.

cgwalters commented 5 years ago

There's a more radical approach here, which would be for the masters to run NTP servers (and also ensure their own clocks are in sync via consensus + polling an external clock source).

JAORMX commented 5 years ago

@cgwalters depends on how far you wanna go. To fulfill our requirement all we needed was to be able to configure chronyd to use at least two trusted sources for the time configuration. Given the case of TLS certificates being issued... we probably would need a way to configure this from moment the cluster is created.

cgwalters commented 5 years ago

Given the case of TLS certificates being issued... we probably would need a way to configure this from moment the cluster is created.

That's what Ignition lets us do, configure the OS before anything really starts. We have total control and power here in the MCO.

cgwalters commented 5 years ago

There's also https://blog.cloudflare.com/roughtime/ on this topic.

mrguitar commented 5 years ago

The fewer services like this that run on the masters the better from a security/management perspective. I think that @JAORMX is right that most environments will want to leverage existing NTP servers. I think setting it in a global MC or CRD would be a good experience.

itamarh commented 4 years ago

I like the approach of an time service at cluster level. It would still use the external one the enterprise has of course, but it means that first and foremost the cluster is in sync across the different nodes. We have faced IPI deployments where the NTP was blocked in some firewall rules. That led to clock discrepancies across nodes and bad things to happen.

JAORMX commented 4 years ago

@itamarh was this a case where NTP was blocked entirely?

Normally external NTP servers aren't reachable in such deployments, but they would usually provide an internal one that's accessible. Perhaps a first low-cost solution would be to be able to configure an NTP server from the installer? This way deployers would be able to specify their respective NTP and make the deployment work as intended from the beginning.

I'm not opposed to having a clustered service though. I can see the benefits of such a thing, especially if we take security enhancements into consideration as @cgwalters suggested. All I'm saying is that we could solve this with lower cost of development if we just provide a configuration option from install time (from ignition or install-config perhaps?).

cgwalters commented 4 years ago

One tricky thing with this is today we don't have an ergonomic way to configure the bootstrap machine, and that could be required in some cases.

I think we may want a variant of https://github.com/openshift/installer/blob/master/docs/user/customization.md#install-time-customization-for-machine-configuration that also includes the bootstrap machine or so.

cgwalters commented 4 years ago

The only reason this isn't a serious blocker for e.g. AWS/GCP/etc clusters not connected to the internet I think is because those platforms provide paravirtualized clocks.

itamarh commented 4 years ago

@itamarh was this a case where NTP was blocked entirely?

Not entirely. Just the worldwide ones, expecting to use an internal one. But this is not a question an IPI installer should require, or user should find out after the installation was required for them when things are bad. An NTP service would solve this by both providing a time source always for the cluster, and could alert on a problem accessing the default NTP / easily configured to another one.

williamcaban commented 4 years ago

This option is needed for bare-metal deployments and it should also be a configuration parameter during install. We have noticed NTP sync is needed for bare-metal deployments in two scenarios. During the installation, when servers do not have the same time, the node will fail to join the cluster as it won't recognize the certificates as valid. The other situation we are experiencing in bare-metal deployments or mixed VMs and bare-metal, is etcd getting into degraded state or completely out of sync due to lack of time sync. Currently, to work around this we have to use MachineConfigs for editing /etc/chrony.conf but we need a more consistent and configurable way to pass the organization's NTP servers. This should also should be a setting on per MCP to allow for different MCPs, when at different physical location, point to their local NTP server.

mrguitar commented 4 years ago

+1 on bare metal and MCP granularity.

I assume chrony will correctly use the NTP server that's defined via DHCP. Googling this gives mixed results, but I believe that should work. If so, we have an easy path for environments w/ a proper DHCP config. This of course doesn't solve the enterprise use case where everything is static. That's where this is painful.

rphillips commented 4 years ago

@cgwalters From the initial post, some Kubelet options do require a reboot. FeatureGates, CPUManagerPolicys (which we support), TopologyManager policies (which we are working on to get to beta and can be enabled within Openshift) all document that the node should reboot when configured or a change in configuration.

The way the clock is updated needs to be addressed as well. Chronyd currently uses a sliding 'scaler' where it slows down or speeds up the time, and can take hours to get a clock in sync again. Chronyd recommends a specific setting to instantaneously set the time, which is preferable in openshift.

https://chrony.tuxfamily.org/faq.html#_is_code_chronyd_code_allowed_to_step_the_system_clock

cgwalters commented 4 years ago

Copying over a BZ comment

Related to having things apply to the bootstrap node, one thing that would probably help here is to have the MCO support an "every machine" selector for MachineConfig, then we could define that as applying to the bootstrap as well. That'd be a bit of a generalization from the multiple roles RFE.

We're effectively doing this manually with the fips installconfig flag for example.

cgwalters commented 4 years ago

xref https://github.com/coreos/fedora-coreos-config/pull/393

I think we should strongly encourage IaaS providers to do two basic things for us:

A reliable clocksource
Reliable source of entropy

It looks like AWS/Azure at least have the former, we just weren't using them. We may still need something like this for baremetal scenarios, but eh, we've also documented how to configure chrony directly, which seems fine.

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 4 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 3 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot commented 3 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/machine-config-operator/issues/629#issuecomment-742955248): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

openshift / machine-config-operator

Add `timeservers` section to MachineConfig #629