red-hat-storage / ocs-ci

https://ocs-ci.readthedocs.io/en/latest/
MIT License
108 stars 168 forks source link

Re-visit test_all_worker_nodes_short_network_failure for IBM deployment #7149

Open am-agrawa opened 1 year ago

am-agrawa commented 1 year ago

This issue is being separated from issue #6840 to track separately and fix it over IBM deployment

https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/362/7573/332097/332165/332166/log

2022-12-14 16:13:52  Events:
2022-12-14 16:13:52    Type    Reason          Age   From               Message
2022-12-14 16:13:52    ----    ------          ----  ----               -------
2022-12-14 16:13:52    Normal  Scheduled       6m3s  default-scheduler  Successfully assigned namespace-test-4d55cef0a5074b158f3e53e77/pod-test-rbd-12043775321d4bde8f681c85484-1-deploy to j-002ici3c33-t4b-2thlg-worker-3-627zt
2022-12-14 16:13:52    Normal  AddedInterface  6m2s  multus             Add eth0 [10.128.2.103/23] from openshift-sdn
2022-12-14 16:13:52    Normal  Pulled          6m2s  kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f136aa5bf62cf114141f6ca7e5f9950afb18e69fe7818ad448f77099b3fe6c2c" already present on machine
2022-12-14 16:13:52    Normal  Created         6m2s  kubelet            Created container deployment
2022-12-14 16:13:52    Normal  Started         6m2s  kubelet            Started container deployment
2022-12-14 16:13:52  
2022-12-14 16:13:52  10:43:52 - MainThread - ocs_ci.ocs.ocp - ERROR  - Wait for pod resource pod-test-rbd-12043775321d4bde8f681c85484-1-deploy at column STATUS to reach desired condition Completed failed, last actual status was Running
github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 30 days if no further activity occurs.

github-actions[bot] commented 1 year ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

am-agrawa commented 2 weeks ago

Recent failure on IBM cloud- https://url.corp.redhat.com/9cbbc8c ODF 4.15.7-2

0 (0%)            0 (0%)\n  hugepages-2Mi      0 (0%)            0 (0%)\nEvents:\n  Type    Reason                   Age                From             Message\n  ----    ------                   ----               ----             -------\n  Normal  Starting                 23m                kube-proxy       \n  Normal  NodeNotReady             27m                node-controller  Node 10.243.128.41 status is now: NodeNotReady\n  Normal  Starting                 23m                kubelet          Starting kubelet.\n  Normal  NodeAllocatableEnforced  23m                kubelet          Updated Node Allocatable limit across pods\n  Normal  NodeHasSufficientMemory  23m (x8 over 23m)  kubelet          Node 10.243.128.41 status is now: NodeHasSufficientMemory\n  Normal  NodeHasNoDiskPressure    23m (x8 over 23m)  kubelet          Node 10.243.128.41 status is now: NodeHasNoDiskPressure\n  Normal  NodeHasSufficientPID     23m (x7 over 23m)  kubelet          Node 10.243.128.41 status is now: NodeHasSufficientPID\n']

ocs_ci/ocs/node.py:201: ResourceWrongStatusException
am-agrawa commented 2 weeks ago

Recent failure on IBM cloud- https://url.corp.redhat.com/9cbbc8c ODF 4.15.7-2

0 (0%)            0 (0%)\n  hugepages-2Mi      0 (0%)            0 (0%)\nEvents:\n  Type    Reason                   Age                From             Message\n  ----    ------                   ----               ----             -------\n  Normal  Starting                 23m                kube-proxy       \n  Normal  NodeNotReady             27m                node-controller  Node 10.243.128.41 status is now: NodeNotReady\n  Normal  Starting                 23m                kubelet          Starting kubelet.\n  Normal  NodeAllocatableEnforced  23m                kubelet          Updated Node Allocatable limit across pods\n  Normal  NodeHasSufficientMemory  23m (x8 over 23m)  kubelet          Node 10.243.128.41 status is now: NodeHasSufficientMemory\n  Normal  NodeHasNoDiskPressure    23m (x8 over 23m)  kubelet          Node 10.243.128.41 status is now: NodeHasNoDiskPressure\n  Normal  NodeHasSufficientPID     23m (x7 over 23m)  kubelet          Node 10.243.128.41 status is now: NodeHasSufficientPID\n']

ocs_ci/ocs/node.py:201: ResourceWrongStatusException

Another failure- https://url.corp.redhat.com/12be82b

am-agrawa commented 2 weeks ago

Both [test_all_worker_nodes_short_network_failure[CephBlockPool]] and [CephFileSystem] are failing on ODF 4.15.7-2 over IBM ROKS.