As result of performance testing at large scale, we noticed that the 4.11 HyperShift clusters running 375 HostedClusters with 12 NodePools each is causing mass KAS OOM kills on the management cluster KAS.
In 4.11, an additional Secret field, payload, was introduced in the ignition-server token Secret to support inplace upgrades, https://github.com/openshift/hypershift/pull/1290. This payload includes the entire MachineConfig content which is very large (434229 bytes). This is causing a scale regression for provider running HyperShift at large scale.
We should update the token secret reconciliation logic to omit the inclusion of this Secret value for non-inplace (replace) upgrade NodePools.
As result of performance testing at large scale, we noticed that the 4.11 HyperShift clusters running 375 HostedClusters with 12 NodePools each is causing mass KAS OOM kills on the management cluster KAS.
In 4.11, an additional Secret field,
payload
, was introduced in the ignition-server token Secret to support inplace upgrades, https://github.com/openshift/hypershift/pull/1290. This payload includes the entire MachineConfig content which is very large (434229 bytes
). This is causing a scale regression for provider running HyperShift at large scale.We should update the token secret reconciliation logic to omit the inclusion of this Secret value for non-inplace (replace) upgrade NodePools.