red-hat-storage / ocs-ci

https://ocs-ci.readthedocs.io/en/latest/
MIT License
109 stars 166 forks source link

osd out problem in test_ceph_health #9471

Open fbalak opened 4 months ago

fbalak commented 4 months ago

In some cases there is a problem with ceph health when osd is being scaled up again:

2024-03-06 09:27:03 08:27:03 - MainThread - ocs_ci.utility.retry - WARNING - Ceph cluster health is not OK. Health: HEALTH_WARN 1 osds down; 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set; 1 host (1 osds) down; 1 rack (1 osds) down; Degraded data redundancy: 808/2424 objects degraded (33.333%), 86 pgs degraded, 185 pgs undersized

After the test completes, the health is back to ok. This needs to be investigated and fixed. E.g. https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/632/19163/931342/931349/log?logParams=history%3D923154%26page.page%3D1

fbalak commented 3 months ago

This seems to be related to external mode.