red-hat-storage / ocs-ci

https://ocs-ci.readthedocs.io/en/latest/
MIT License
109 stars 166 forks source link

test_fio_workload_simple ( RBD+CephFS/random + sequential) - all test cases are failing in 4.15 #9221

Closed ypersky1980 closed 6 months ago

ypersky1980 commented 8 months ago

Test case is failing - re-run the test and determine whether this is a product bug ( open a bz) or a test bug ( submit a pr with a fix)

https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/all/17989/883997/884063/log?item1Params=page.page%3D2

self = <test_fio_benchmark.TestFIOBenchmark object at 0x7fe543524fa0>

def setup(self): """ Setting up test parameters """ log.info("Starting the test setup") self.benchmark_name = "FIO" self.client_pod_name = "fio-client"

super(TestFIOBenchmark, self).setup()

tests/e2e/performance/io_workload/test_fio_benchmark.py:149:

ocs_ci/ocs/perftests.py:97: in setup self.get_osd_info() ocs_ci/ocs/perftests.py:229: in get_osd_info osd_info = ct_pod.exec_ceph_cmd(ceph_cmd="ceph osd df") ocs_ci/ocs/resources/pod.py:345: in exec_ceph_cmd out = self.exec_cmd_on_pod( ocs_ci/ocs/resources/pod.py:192: in exec_cmd_on_pod return self.ocp.exec_oc_cmd( ocs_ci/ocs/ocp.py:178: in exec_oc_cmd out = run_cmd( ocs_ci/utility/utils.py:484: in run_cmd completed_process = exec_cmd( ocs_ci/utility/utils.py:633: in exec_cmd completed_process = subprocess.run( /usr/lib64/python3.8/subprocess.py:495: in run stdout, stderr = process.communicate(input, timeout=timeout) /usr/lib64/python3.8/subprocess.py:1028: in communicate stdout, stderr = self._communicate(input, endtime, timeout) /usr/lib64/python3.8/subprocess.py:1869: in _communicate self._check_timeout(endtime, orig_timeout, stdout, stderr)

self = <subprocess.Popen object at 0x7fe54cde3f40>, endtime = 203052.147375222 orig_timeout = 600, stdout_seq = [], stderr_seq = [] skip_check_and_raise = False

def _check_timeout(self, endtime, orig_timeout, stdout_seq, stderr_seq, skip_check_and_raise=False): """Convenience for checking if a timeout has expired.""" if endtime is None: return if skip_check_and_raise or _time() > endtime:

  raise TimeoutExpired(
            self.args, orig_timeout,
            output=b''.join(stdout_seq) if stdout_seq else None,
            stderr=b''.join(stderr_seq) if stderr_seq else None)

E subprocess.TimeoutExpired: Command '['oc', '--kubeconfig', '/home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig', '-n', 'openshift-storage', 'rsh', 'rook-ceph-tools-7997d9b857-g4kns', 'ceph', 'osd', 'df', '--format', 'json-pretty']' timed out after 600 seconds

/usr/lib64/python3.8/subprocess.py:1072: TimeoutExpired

ypersky1980 commented 8 months ago

https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/all/17989/883997/884064/log?item1Params=page.page%3D2

ypersky1980 commented 8 months ago

https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/all/17989/883997/884065/log?item1Params=page.page%3D2

ypersky1980 commented 8 months ago

https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/all/17989/883997/884066/log?item1Params=page.page%3D2

ypersky1980 commented 6 months ago

The test passes on 4.15.0-158 ( GA build) , on both IBM and VMware LSO platforms. The results can be found here: http://10.0.78.167:8080/index.php?version1=41&build1=130&platform1=2&az_topology1=3&test_name%5B%5D=1&test_name%5B%5D=2&test_name%5B%5D=3&test_name%5B%5D=4&test_name%5B%5D=6&test_name%5B%5D=8&test_name%5B%5D=9&test_name%5B%5D=10&test_name%5B%5D=11&test_name%5B%5D=15&test_name%5B%5D=16&test_name%5B%5D=17&test_name%5B%5D=18&test_name%5B%5D=20&test_name%5B%5D=21&test_name%5B%5D=23&version2=40&build2=135&platform2=2&az_topology2=3&version3=&build3=&version4=&build4=&submit=Choose+options

Therefore I'm closing the issue.