openshift / hypershift

Hyperscale OpenShift - clusters with hosted control planes
https://hypershift-docs.netlify.app
Apache License 2.0
432 stars 319 forks source link

Disable MachineConfig payload in token secret for non-inplace upgrade NodePools #1843

Closed hasueki closed 1 year ago

hasueki commented 2 years ago

As result of performance testing at large scale, we noticed that the 4.11 HyperShift clusters running 375 HostedClusters with 12 NodePools each is causing mass KAS OOM kills on the management cluster KAS.

In 4.11, an additional Secret field, payload, was introduced in the ignition-server token Secret to support inplace upgrades, https://github.com/openshift/hypershift/pull/1290. This payload includes the entire MachineConfig content which is very large (434229 bytes). This is causing a scale regression for provider running HyperShift at large scale.

We should update the token secret reconciliation logic to omit the inclusion of this Secret value for non-inplace (replace) upgrade NodePools.

hasueki commented 2 years ago

/assign @isco-rodriguez