red-hat-storage / ocs-ci

https://ocs-ci.readthedocs.io/en/latest/
MIT License
108 stars 166 forks source link

test_dashboard_validation_ui fails on BackingStore and BucketClass state=rejected #10496

Open DanielOsypenko opened 1 month ago

DanielOsypenko commented 1 month ago

https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/678/24647/1195944/1195971/log?logParams=history%3D1195971%26page.page%3D1 noobaa-related resources recovered during the job execution. Test was running ±1H after deployment has finished.

We need to collect more failures. Most likely not a UI issue.

  pytest.fail(
        "Following checks failed. 1 - Pass, 0 - Fail. \n{}".format(
            failed_checks
        )
    )

E Failed: Following checks failed. 1 - Pass, 0 - Fail. E ['backing_store_status_ready', 'bucket_class_status']

image image

DanielOsypenko commented 1 month ago

This may be a sign of

  1. Misconfigured Storage Backends
  2. Resource Availability Issues
  3. Validation Failures
  4. Cluster Issues or Network Problems
  5. Incorrect Policy Definitions
DanielOsypenko commented 1 month ago

@mashetty330, @udaysk23, @sagihirshfeld can you pls take a look?

sagihirshfeld commented 1 month ago

Here's what I found so far:

  1. Since this is a UI test, we're not taking must-gather logs by default, and we might need to enable the mg collection for this specific test if the issue repeats. The only reason I was able to check any logs is because the Backingstore was still Rejected when one of the following none-UI tests failed and took the logs.

  2. All the MCG related tests have passed before and after so it looks like whatever the issue was, it auto-recovered fast.

  3. The Rejected Backingstore's YAML show the ALL_NODES_OFFLINE error

  4. The following error which might be related was found in the noobaa-core logs:

    Sep-1 9:33:10.799 [WebServer/34] [ERROR] core.server.node_services.nodes_monitor:: _test_network_to_server:: node has gateway_errors noobaa-internal-agent-66d4237b1a0223002224e29d [Error: N2N ICE CLOSED]

It might be a bug, but we should wait for further reproduces before creating a BZ