red-hat-storage / ocs-ci

https://ocs-ci.readthedocs.io/en/latest/
MIT License
108 stars 166 forks source link

add flaky_test decorator to rerun test_ceph_osd_slow_ops_alert #9643

Open DanielOsypenko opened 6 months ago

DanielOsypenko commented 6 months ago
  1. try to rerun the queries or not to fail when Prometheus did not resp. It is a known issue that for couple minutes it may become unresponsive. E ocs_ci.ocs.exceptions.TimeoutExpiredError: Timed out after 300s running get("url"="[https://prometheus-k8s-openshift-monitoring.apps.j-011vup1cs33-t3.qe.rh-ocs.com/api/v1/alerts"](https://prometheus-k8s-openshift-monitoring.apps.j-011vup1cs33-t3.qe.rh-ocs.com/api/v1/alerts%22), "headers"={'Authorization': 'Bearer sha256~wthhAPuEnlh8YAHGbIAEqApRbE-af_I15UW_BuOBvTw'}, "verify"=False, "params"={'silenced': False, 'inhibited': False}) https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/632/19989/968190/968203/log

  2. Failed: failed to get 'CephOSDSlowOps' while workload filled up the storage to 0.85 percents https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/632/19958/967097/967110/log In this case it worth to rerun the test or wait more time to get the alert.


try to get this Alert on Compact mode, manually.

Add flaky decorator, same as - https://github.com/red-hat-storage/ocs-ci/pull/9532

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 30 days if no further activity occurs.

DanielOsypenko commented 3 months ago

up! many more failures on 4.16 https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/678/21271/1016601/1016612/log

DanielOsypenko commented 1 month ago

fails on 4.15 as well -> https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/632/20346/978632/978645/log?logParams=history%3D967110%26page.page%3D1

Failed: failed to get 'CephOSDSlowOps' while workload filled up the storage to 0.85 percents