[Multiple Clusters][Test Parallel] Incorrect Cluster Upgrade Execution Despite Cluster Specification

AmarnatReddy commented 6 days ago

Describe the bug With the current YAML configuration for parallel testing, even when we specify a specific cluster (e.g., ceph1), the test is performing the same upgrade operation on both clusters.

Expected Behavior: Only ceph1 cluster should be upgraded as specified in the YAML file. Actual Behavior: Both clusters are getting upgraded due to the loop in the run.py script. It seems like the for loop at the following line is causing this issue: https://github.com/red-hat-storage/cephci/blob/25f0c83fcf9c4ab68a9159d90f99c4f16fcf4552/run.py#L729

 - test:
      name: Upgrade along with IOs
      module: test_parallel.py
      parallel:
        - test:
            abort-on-fail: false
            config:
              timeout: 30
              client_upgrade: 1
              client_upgrade_node: 'node8'
            desc: Runs IOs in parallel with upgrade process
            module: cephfs_upgrade.cephfs_io.py
            name: "creation of Prerequisites for Upgrade"
            polarion-id: CEPH-83575315
        - test:
            name: Upgrade ceph
            desc: Upgrade cluster to latest version
            module: cephadm.test_cephadm_upgrade.py
            polarion-id: CEPH-83574638
            clusters:
              ceph1:
                config:
                  command: start
                  service: upgrade
                  base_cmd_args:
                    verbose: true
                  benchmark:
                    type: rados
                    pool_per_client: true
                    pg_num: 128
                    duration: 10
                  verify_cluster_health: false
            destroy-cluster: false
      desc: Running upgrade, mds Failure and i/o's parallelly
      abort-on-fail: true

Logs : http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/RH/7.1/rhel-9/Upgrade/18.2.1-251/133/tier-0_cephfs_mirrror_upgrade_6x_to_7x/Upgrade_along_with_IOs_0.log

To Reproduce Steps to reproduce the behavior:

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Environment A concise description of your environment: RHEL version, python version, operating system, command used etc.

Additional context Add any other context about the problem here.

psathyan commented 3 days ago

When running with test_parallel module, run.py has no bearing with the individual steps inside parallel. This line https://github.com/red-hat-storage/cephci/blob/25f0c83fcf9c4ab68a9159d90f99c4f16fcf4552/tests/parallel/test_parallel.py#L86 is executor in this case.

In this module, we are only picking the first cluster for execution and it is not iterated over.

psathyan commented 3 days ago

Due to lack of clusters key in the top level module, run.py does trigger the parallel execution for both the cluster which causes the upgrade. If the intent is not run upgrade against a all clusters then we should provide clusters in the top level test case i.e. test_parallel.py module.

Is this the right approach? Debatable.

IMHO, we would need to re-write both the modules to support parallel execution.

red-hat-storage / cephci

[Multiple Clusters][Test Parallel] Incorrect Cluster Upgrade Execution Despite Cluster Specification #4162