test_rgw_unavailable is failing on IBM Power

AbhishekMundada commented 1 year ago

tests/manage/monitoring/prometheus/test_rgw.py::test_rgw_unavailable is falling on IBM Power due to following error.

Assertion error. AssertionError: Incorrect number of ClusterObjectStoreState alerts (0 instead of 2 with states: ['pending', 'firing']).

AbhishekMundada commented 1 year ago

Message: E       AssertionError: Incorrect number of ClusterObjectStoreState alerts (0 instead of 2 with states: ['pending', 'firing']).
E       Alerts: []

Text :
measure_stop_rgw = {'first_run': True, 'metadata': None, 'prometheus_alerts': [{'activeAt': '2023-03-14T18:05:30Z', 'annotations': {'desc...bjectstore', 'rgw': 'ocs-storagecluster-cephobjectstore', 'rook_cluster': 'openshift-storage', ...}}, ...}, ...}], ...}

    @tier4c
    @pytest.mark.polarion_id("OCS-2323")
    @pytest.mark.bugzilla("1953615")
    @skipif_managed_service
    def test_rgw_unavailable(measure_stop_rgw):
        """
        Test that there is appropriate alert when RGW is unavailable and that
        this alert is cleared when the RGW interface is back online.

        """
        api = prometheus.PrometheusAPI()

        # get alerts from time when manager deployment was scaled down
        alerts = measure_stop_rgw.get("prometheus_alerts")
        target_label = constants.ALERT_CLUSTEROBJECTSTORESTATE
        # The alert message is changed since OCS 4.7
        ocs_version = config.ENV_DATA["ocs_version"]
        if Version.coerce(ocs_version) < Version.coerce("4.7"):
            target_msg = (
                "Cluster Object Store is in unhealthy state for more than 15s. "
                "Please check Ceph cluster health or RGW connection."
            )
           else:
            target_msg = "Cluster Object Store is in unhealthy state. Please check Ceph cluster health."
        states = ["pending", "firing"]

>       prometheus.check_alert_list(
            label=target_label,
            msg=target_msg,
            alerts=alerts,
            states=states,
            severity="error",
        )

tests/manage/monitoring/prometheus/test_rgw.py:42:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

label = 'ClusterObjectStoreState'
msg = 'Cluster Object Store is in unhealthy state. Please check Ceph cluster health.'
alerts = [{'activeAt': '2023-03-14T18:05:30Z', 'annotations': {'description': 'Alerts are not configured to be sent to a notifi...els': {'alertname': 'Watchdog', 'namespace': 'openshift-monitoring', 'severity': 'none'}, 'state': 'firing', ...}, ...]
states = ['pending', 'firing'], severity = 'error'
ignore_more_occurences = True

    def check_alert_list(
        label, msg, alerts, states, severity="warning", ignore_more_occurences=True
    ):
        """
        Check list of alerts that there are alerts with requested label and
        message for each provided state. If some alert is missing then this check
        fails.

        Args:
            label (str): Alert label
            msg (str): Alert message
            alerts (list): List of alerts to check
            states (list): List of states to check, order is important
            ignore_more_occurences (bool): If true then there is checkced only
                occurence of alert with requested label, message and state but
                it is not checked if there is more of occurences than one.
        """

        target_alerts = [
            alert for alert in alerts if alert.get("labels").get("alertname") == label
        ]

        logger.info(f"Checking properties of found {label} alerts")
        if ignore_more_occurences:
            for state in states:
                delete = False
                for key, alert in reversed(list(enumerate(target_alerts))):
                    if alert.get("state") == state:
                        if delete:
                            d_msg = f"Ignoring {alert} as alert already appeared."
                            logger.debug(d_msg)
                            target_alerts.pop(key)
                        else:
                            delete = True
        assert_msg = (
            f"Incorrect number of {label} alerts ({len(target_alerts)} "
            f"instead of {len(states)} with states: {states})."
            f"\nAlerts: {target_alerts}"
        )
>       assert len(target_alerts) == len(states), assert_msg
E       AssertionError: Incorrect number of ClusterObjectStoreState alerts (0 instead of 2 with states: ['pending', 'firing']).
E       Alerts: []

ocs_ci/utility/prometheus.py:61: AssertionError

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 30 days if no further activity occurs.

github-actions[bot] commented 1 year ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

red-hat-storage / ocs-ci

test_rgw_unavailable is failing on IBM Power #7336