red-hat-storage / ocs-ci

https://ocs-ci.readthedocs.io/en/latest/
MIT License
108 stars 168 forks source link

test_ui_storage_size_post_resize_osd failed despite the results looking fine. #10778

Open OdedViner opened 1 month ago

OdedViner commented 1 month ago

E AssertionError: The total UI size 6139Gi is not in the expected total size range range(12276, 12300)Gi

check UI: 6T Check "ceph df" --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 6 TiB 5.9 TiB 98 GiB 98 GiB 1.59 TOTAL 6 TiB 5.9 TiB 98 GiB 98 GiB 1.59

--- POOLS ---

Job Link: https://url.corp.redhat.com/6c5bc06

MG:https://url.corp.redhat.com/2fe022d

ScreenShot: https://url.corp.redhat.com/7907f0f

DanielOsypenko commented 1 month ago

also another problem on finalizer https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/738/26078/1273990/1274257/log

def finalizer():
        if not skipped:
            multi_storagecluster_external_health_passed = False
            try:
                teardown = ocsci_config.RUN["cli_params"]["teardown"]
                skip_ocs_deployment = ocsci_config.ENV_DATA["skip_ocs_deployment"]
                ceph_cluster_installed = ocsci_config.RUN.get("cephcluster")
                if not (
                    teardown
                    or skip_ocs_deployment
                    or mcg_only_deployment
                    or not ceph_cluster_installed
                ):
                    # We are allowing 20 re-tries for health check, to avoid teardown failures for cases like:
                    # "flip-flopping ceph health OK and warn because of:
                    # HEALTH_WARN Reduced data availability: 2 pgs peering

              ceph_health_check(
                    namespace=ocsci_config.ENV_DATA["cluster_namespace"]
                )

tests/conftest.py:1706: 

ocs_ci/utility/utils.py:2478: in ceph_health_check
    return retry(
ocs_ci/utility/retry.py:49: in f_retry
    return f(args, *kwargs)
ocs_ci/utility/utils.py:2508: in ceph_health_check_base
    health = run_ceph_health_cmd(namespace)
ocs_ci/utility/utils.py:2560: in run_ceph_health_cmd
    return ct_pod.exec_ceph_cmd(
ocs_ci/ocs/resources/pod.py:350: in exec_ceph_cmd
    out = self.exec_cmd_on_pod(
ocs_ci/ocs/resources/pod.py:195: in exec_cmd_on_pod
    return self.ocp.exec_oc_cmd(
ocs_ci/ocs/ocp.py:212: in exec_oc_cmd
    out = run_cmd(
ocs_ci/utility/utils.py:487: in run_cmd
    completed_process = exec_cmd(
ocs_ci/utility/utils.py:677: in exec_cmd
    completed_process = subprocess.run(
/usr/lib64/python3.9/subprocess.py:507: in run
    stdout, stderr = process.communicate(input, timeout=timeout)
/usr/lib64/python3.9/subprocess.py:1134: in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
/usr/lib64/python3.9/subprocess.py:1996: in _communicate
    self._check_timeout(endtime, orig_timeout, stdout, stderr)

self = <Popen: returncode: -9 args: ['oc', '--kubeconfig', '/home/jenkins/current-c...>
endtime = 74513.520891323, orig_timeout = 120, stdout_seq = [], stderr_seq = []
skip_check_and_raise = False

def _check_timeout(self, endtime, orig_timeout, stdout_seq, stderr_seq,
                   skip_check_and_raise=False):
    """Convenience for checking if a timeout has expired."""
    if endtime is None:
        return
    if skip_check_and_raise or _time() > endtime:

      raise TimeoutExpired(
                self.args, orig_timeout,
                output=b''.join(stdout_seq) if stdout_seq else None,
                stderr=b''.join(stderr_seq) if stderr_seq else None)
E           subprocess.TimeoutExpired: Command '['oc', '--kubeconfig', '/home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig', '-n', 'openshift-storage', 'rsh', 'rook-ceph-tools-75d69775b8-4rg5k', 'ceph', 'health']' timed out after 120 seconds

Test has never passed

yitzhak12 commented 1 month ago

Okay, I will check it.