nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
2 stars 0 forks source link

Update cluster operators to require manual updates #148

Closed dystewart closed 1 year ago

dystewart commented 1 year ago

We want to make sure updates happen when people are available to watch them

@computate @aabaris and I ran into a pod from the lokistack which was suddenly broken. This break caused memory resources of one of the control plane nodes to be completely consumed which had varying negative effects on infra cluster performance.

We traced the issue back to one of the associated operators having upgraded automatically in the background. We should change the installPlanApproval field to manual for our operators to prevent something like this from happening in the future.

In order to manually update operators we can follow this procedure https://docs.openshift.com/container-platform/4.10/operators/admin/olm-upgrading-operators.html#olm-approving-pending-upgrade_olm-upgrading-operators

dystewart commented 1 year ago

Also of note, changing the installPlanApproval to manual will require manual intervention through OLM to install an operator for the first time, in much the same way as we approve the updates in the link above

dystewart commented 1 year ago

It would also make sense to perform these upgrades during scheduled maintenance periods