openshift / hypershift

Hyperscale OpenShift - clusters with hosted control planes
https://hypershift-docs.netlify.app
Apache License 2.0
418 stars 311 forks source link

Strategy for providing consumer specific machine configuration in hypershift #1510

Closed relyt0925 closed 1 year ago

relyt0925 commented 2 years ago

Users (and some Red Hat Operators) have the need to apply machine configuration (kernel parameters, files, systemd units, etc) across a pool of machines in order to function appropriately. One tangible use case is telcos are interested in using Red Hat ecosystem components like the Performance Operator to configure huge pages and node configuration for low latency applications. More details provided at this page: https://docs.openshift.com/container-platform/4.10/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.html#cnf-understanding-low-latency_cnf-master

In multiple conversations with customers consuming the hypershift offering: references to the ability to use machine configs to customize the nodes are brought up as a need. Confusion arises when they look at Red Hat documentation and see references to MachineConfigs: https://docs.openshift.com/container-platform/4.7/post_installation_configuration/machine-configuration-tasks.html

However: MachineConfigs are not supported in Hypershift. I do see that there are separate documentation suites for the various Red Hat offering like Azure Red Hat Openshift which states machine config customizations cannot be done: https://docs.microsoft.com/en-us/azure/openshift/support-policies-v4 Don't override any of the cluster's MachineConfig objects (for example, the kubelet configuration) in any way.

Hypershift will ultimately have it's own section of documentation to help clear up some of this confusion

The main need is to provide a strategy for these use cases. Currently: the customers are running daemonsets that proceed to roll the necessary configuration payloads they need across the set of nodes they are interesting in applying them to. This does however lead to the need to manage the daemonset and configuration (Configmaps) the daemonset consumes and then lays down on the node. Things get more difficult when kernel parameters additionally need to be layed down that require rebooting the node.

Opening this issue to track what the solution will ultimately be for allowing users to provide machine customizations into the environment (and the associated implementation). At that point: we will point customers to that solution versus the daemonset approach that is currently being utilized.

relyt0925 commented 2 years ago

Some of the more high level use cases customers are interested in: CPU Pinning with CPU manager NUMA aware scheduling with memory manager/topology manager.

openshift-bot commented 1 year ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 1 year ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 1 year ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci[bot] commented 1 year ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/hypershift/issues/1510#issuecomment-1340546485): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
bhpratt commented 1 year ago

/reopen

openshift-ci[bot] commented 1 year ago

@bhpratt: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/openshift/hypershift/issues/1510#issuecomment-1440256604): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
hasueki commented 1 year ago

/reopen

openshift-ci[bot] commented 1 year ago

@hasueki: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/openshift/hypershift/issues/1510#issuecomment-1476257188): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.