red-hat-storage / ocs-ci

https://ocs-ci.readthedocs.io/en/latest/
MIT License
109 stars 166 forks source link

"ceph osd pool" != ceph_pool_stored in multi StorageCluster deployment #10056

Open DanielOsypenko opened 3 months ago

DanielOsypenko commented 3 months ago

test test_monitoring_enabled is failing on cluster with config

# Config file for deploying multi StorageCluster scenario
DEPLOYMENT:
  allow_lower_instance_requirements: false
  multi_storagecluster: true
ENV_DATA:
  platform: 'vsphere'
  deployment_type: 'upi'
  worker_replicas: 3
  master_replicas: 3
  worker_num_cpus: '16'
  master_num_cpus: '4'
  master_memory: '16384'
  compute_memory: '65536'
  fio_storageutilization_min_mbps: 10.0
  storage_cluster_name: 'ocs-storagecluster'
  external_storage_cluster_name: 'ocs-external-storagecluster'
  external_storage_cluster_namespace: 'openshift-storage-extended'
REPORTING:
  polarion:
    deployment_id: 'OCS-5503'

https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/678/22684/1087608/1087644/log?logParams=history%3D1087644%26page.page%3D1

DanielOsypenko commented 3 months ago

@shyRozen I think this is a recent deployment feature that you covered. Can you please help to check this case? May it be a bug scenario? Thanks.

DanielOsypenko commented 1 week ago

more issues with multi storagecluster: https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/632/25161/1219489/1219528/log?logParams=history%3D1219528%26page.page%3D1

DanielOsypenko commented 1 week ago

if config.DEPLOYMENT.get("multi_storagecluster") we'll need to connect to external storage to request ceph osd pool ls and add to internal storage ceph osd pool ls. In this case we will probably receive 20 == 20 instead of receiving a failure bellow:

def test_monitoring_enabled(threading_lock):
    """
    OCS Monitoring is enabled after OCS installation (which is why this test
    has a post deployment marker) by asking for values of one ceph and one
    noobaa related metrics.
    """
    prometheus = PrometheusAPI(threading_lock=threading_lock)

    if (
        storagecluster_independent_check()
        and float(config.ENV_DATA["ocs_version"]) < 4.6
    ):
        logger.info(
            f"Skipping ceph metrics because it is not enabled for external "
            f"mode for OCS {float(config.ENV_DATA['ocs_version'])}"
        )

    else:
        # ask for values of ceph_pool_stored metric
        logger.info("Checking that ceph data are provided in OCS monitoring")
        result = prometheus.query("ceph_pool_stored")
        msg = "check that we actually received some values for a ceph query"
        assert len(result) > 0, msg
        for metric in result:
            _, value = metric["value"]
            assert_msg = "number of bytes in a pool isn't a positive integer or zero"
            assert int(value) >= 0, assert_msg
        # additional check that values makes at least some sense
        logger.info(
            "Checking that size of ceph_pool_stored result matches number of pools"
        )
        ct_pod = pod.get_ceph_tools_pod()
        ceph_pools = ct_pod.exec_ceph_cmd("ceph osd pool ls")
      assert len(result) == len(ceph_pools)
E           assert 20 == 12
E             +20
E             -12