openshift-kni / lifecycle-agent

Local agent for orchestration of SNO Image Based Upgrade
Apache License 2.0
6 stars 28 forks source link

CNF-12403: Introduce Subscription health check #520

Closed donpenney closed 3 months ago

donpenney commented 4 months ago

Background / Context

LCA runs system health checks during SeedGen and IBU to ensure that the cluster is healthy and stable before proceeding.

Issue / Requirement / Reason for change

An unhealthy subscription means there may not be a corresponding CSV. The existing health checks validate the CSV status, but if the CSV doesn't exist, that health check passes. As a result, the user may generate a seed image, for example, without realizing that the desired operators have not actually been installed.

Solution / Feature Overview

Introducing a Subscription health check gives added protection by ensuring all subscriptions are in a healthy state.

Implementation Details

This update adds a Subscription check to the set of system health checks. The Subscription check ensures that each subscription CR has no Failed conditions set to true and CatalogSourcesUnhealthy is false, indicating the subscription is healthy.

This check is skipped in the postpivot Upgrade stage handler, as it is expected that the catalog sources will not have been updated at this point, and each subscription would therefore be reported as unhealthy.

openshift-ci-robot commented 4 months ago

@donpenney: This pull request references CNF-12403 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/openshift-kni/lifecycle-agent/pull/520): ># Background / Context > >LCA runs system health checks during SeedGen and IBU to ensure that the cluster is healthy and stable before proceeding. > ># Issue / Requirement / Reason for change > >An unhealthy subscription means there may not be a corresponding CSV. The existing health checks validate the CSV status, but if the CSV doesn't exist, that health check passes. As a result, the user may generate a seed image, for example, without realizing that the desired operators have not actually been installed. > ># Solution / Feature Overview > >Introducing a Subscription health check gives added protection by ensuring all subscriptions are in a healthy state. > ># Implementation Details > >This update adds a Subscription check to the set of system health checks, along with optionality for this specific check. The Subscription check ensures that each subscription CR has no conditions set to true, which indicates the subscription is healthy. > >This check is skipped in the postpivot Upgrade stage handler, as it is expected that the catalog sources will not have been updated at this point, and each subscription would therefore be reported as unhealthy. > >One implication of the Subscription health check is that there are no updates pending, meaning that the corresponding operators are running the latest available version, depending on the subscription configuration. In order to allow the user to proceed on a cluster without completing such updates, should they so desire, LCA will look for an annotation to determine whether to disable or skip this specific health check. This annotation can be added to either the IBU or Seedgen CRs to bypass the check altogether, for those steps: > >healthchecks.lca.openshift.io/subscriptions: Disabled > > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift-kni%2Flifecycle-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 4 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please ask for approval from donpenney. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/openshift-kni/lifecycle-agent/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
donpenney commented 4 months ago

/cc @jc-rh

donpenney commented 4 months ago

/retest

donpenney commented 4 months ago

/hold for discussion on "update available" scenario

openshift-ci-robot commented 4 months ago

@donpenney: This pull request references CNF-12403 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

In response to [this](https://github.com/openshift-kni/lifecycle-agent/pull/520): ># Background / Context > >LCA runs system health checks during SeedGen and IBU to ensure that the cluster is healthy and stable before proceeding. > ># Issue / Requirement / Reason for change > >An unhealthy subscription means there may not be a corresponding CSV. The existing health checks validate the CSV status, but if the CSV doesn't exist, that health check passes. As a result, the user may generate a seed image, for example, without realizing that the desired operators have not actually been installed. > ># Solution / Feature Overview > >Introducing a Subscription health check gives added protection by ensuring all subscriptions are in a healthy state. > ># Implementation Details > >This update adds a Subscription check to the set of system health checks. The Subscription check ensures that each subscription CR has no Failed conditions set to true and CatalogSourcesUnhealthy is false, indicating the subscription is healthy. > >This check is skipped in the postpivot Upgrade stage handler, as it is expected that the catalog sources will not have been updated at this point, and each subscription would therefore be reported as unhealthy. > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift-kni%2Flifecycle-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
jc-rh commented 4 months ago

/lgtm

pixelsoccupied commented 4 months ago

/lgtm

Missxiaoguo commented 4 months ago

/lgtm

romfreiman commented 4 months ago

/hold again, what will it affect? when will the customers be blocked? why not to verify it only upon seed creation?

donpenney commented 4 months ago

/hold again, what will it affect? when will the customers be blocked? why not to verify it only upon seed creation?

@romfreiman as noted, this PR has been revised to only look for failure conditions. It no longer checks the conditions relating to available updates. It now just checks to ensure that the subscriptions are healthy - ie. catalog sources are healthy, no failed conditions.

openshift-merge-robot commented 4 months ago

PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.